{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 问题描述\n",
    "\n",
    "根据活动的关键词（count_1, count_2, ..., count_100，count_other属性）做聚类，可采用KMeans聚类 \n",
    "尝试K=10，20，30，..., 100, 并计算各自CH_scores。 \n",
    "\n",
    "## 解题提示\n",
    "\n",
    "文件说明： \n",
    "1. 可以先运行0. EDA.ipynb，看一下竞赛所有数据的情况； \n",
    "2. 总体活动的数目太多（300w+记录），可以只需对训练集train.csv和测试集test.cv出现的活动（13418条记录）举行聚类即可。\n",
    "运行1. Users_Events.ipynb可得到只在训练集train.csv和测试集test.csv出现的活动，可自己修改代码存为csv格式，在进行聚类。 \n",
    "\n",
    "## 批改标准\n",
    "1. 抽取出只在训练集和测试集中出现的event：20分 \n",
    "2. 聚类 ：40分 \n",
    "3. CH_scores计算：20分 \n",
    "4. 结果显示/分析：20分"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 数据抽取"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "events.csv 数据太大了，没有传到git上，此处仅展示代码\n",
    "\n",
    "1. 收集训练数据和测试数据的eventid 到一个set队列中\n",
    "2. 一次读取所有的events到内存中，（时间，内存消耗都很大）\n",
    "3. 使用 pandas.loc 对events 数据进行抽取，条件就是 eventid 在对应的队列中\n",
    "4. 分别将抽取之后的数据写到 csv 中，方便后面的使用\n",
    "\n",
    "```\n",
    "import pandas as pd\n",
    "\n",
    "def collectEventIds(filename):\n",
    "    events = set()\n",
    "    f = open(filename, 'rb')\n",
    "    # 忽略第一行（列名字）\n",
    "    f.readline().strip().split(\",\")\n",
    "    for line in f:  # 对每条记录\n",
    "        cols = line.strip().split(\",\")\n",
    "        events.add(cols[1])  # 第二列为活动ID\n",
    "    f.close()\n",
    "    return events\n",
    "\n",
    "train_event_ids = collectEventIds(\"/Users/wuzhong/gitee/homework4/train.csv\")\n",
    "test_events_ids = collectEventIds(\"/Users/wuzhong/gitee/homework4/test.csv\")\n",
    "\n",
    "print train_event_ids\n",
    "\n",
    "# 读取 events.csv 保存到本地\n",
    "events = pd.read_csv(\"/Users/wuzhong/gitee/homework4/events.csv\")\n",
    "\n",
    "train_events = events.loc[events['event_id'].isin(train_event_ids)]\n",
    "print \"train_events.info()\"\n",
    "print train_events.info()\n",
    "train_events.to_csv(\"train_events.csv\")\n",
    "print \"saved to train_events.csv\"\n",
    "\n",
    "test_events = events.loc[events['event_id'].isin(test_events_ids)]\n",
    "print \"test_events.info()\"\n",
    "print test_events.info()\n",
    "test_events.to_csv(\"test_events.csv\")\n",
    "print \"saved to test_events.csv\"\n",
    "\n",
    "print events.info()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 对抽取后的训练数据进行特征工程处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>event_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>lat</th>\n",
       "      <th>lng</th>\n",
       "      <th>c_1</th>\n",
       "      <th>c_2</th>\n",
       "      <th>c_3</th>\n",
       "      <th>c_4</th>\n",
       "      <th>c_5</th>\n",
       "      <th>c_6</th>\n",
       "      <th>...</th>\n",
       "      <th>c_92</th>\n",
       "      <th>c_93</th>\n",
       "      <th>c_94</th>\n",
       "      <th>c_95</th>\n",
       "      <th>c_96</th>\n",
       "      <th>c_97</th>\n",
       "      <th>c_98</th>\n",
       "      <th>c_99</th>\n",
       "      <th>c_100</th>\n",
       "      <th>c_other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>8.846000e+03</td>\n",
       "      <td>8.846000e+03</td>\n",
       "      <td>5290.000000</td>\n",
       "      <td>5290.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>2.142609e+09</td>\n",
       "      <td>2.128939e+09</td>\n",
       "      <td>24.694942</td>\n",
       "      <td>-20.339798</td>\n",
       "      <td>2.406285</td>\n",
       "      <td>1.398372</td>\n",
       "      <td>1.279674</td>\n",
       "      <td>0.863780</td>\n",
       "      <td>1.199186</td>\n",
       "      <td>2.596428</td>\n",
       "      <td>...</td>\n",
       "      <td>0.062288</td>\n",
       "      <td>0.082071</td>\n",
       "      <td>0.093828</td>\n",
       "      <td>0.069184</td>\n",
       "      <td>0.078793</td>\n",
       "      <td>0.302736</td>\n",
       "      <td>0.079810</td>\n",
       "      <td>0.081054</td>\n",
       "      <td>0.072914</td>\n",
       "      <td>58.926407</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>1.234082e+09</td>\n",
       "      <td>1.260700e+09</td>\n",
       "      <td>21.430399</td>\n",
       "      <td>93.259494</td>\n",
       "      <td>23.632890</td>\n",
       "      <td>2.949282</td>\n",
       "      <td>2.764838</td>\n",
       "      <td>2.021167</td>\n",
       "      <td>19.272932</td>\n",
       "      <td>7.697351</td>\n",
       "      <td>...</td>\n",
       "      <td>0.304606</td>\n",
       "      <td>0.381958</td>\n",
       "      <td>0.399565</td>\n",
       "      <td>0.317505</td>\n",
       "      <td>0.525019</td>\n",
       "      <td>19.153367</td>\n",
       "      <td>0.359582</td>\n",
       "      <td>0.477520</td>\n",
       "      <td>0.341243</td>\n",
       "      <td>128.470916</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.040700e+05</td>\n",
       "      <td>1.329876e+06</td>\n",
       "      <td>-50.333000</td>\n",
       "      <td>-157.863000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1.088448e+09</td>\n",
       "      <td>1.013605e+09</td>\n",
       "      <td>3.578000</td>\n",
       "      <td>-95.304750</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>14.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>2.119257e+09</td>\n",
       "      <td>2.129289e+09</td>\n",
       "      <td>33.976500</td>\n",
       "      <td>-74.213000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>39.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>3.212027e+09</td>\n",
       "      <td>3.221692e+09</td>\n",
       "      <td>42.280500</td>\n",
       "      <td>98.688750</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>75.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>4.294677e+09</td>\n",
       "      <td>4.294033e+09</td>\n",
       "      <td>60.226000</td>\n",
       "      <td>174.777000</td>\n",
       "      <td>2186.000000</td>\n",
       "      <td>82.000000</td>\n",
       "      <td>85.000000</td>\n",
       "      <td>71.000000</td>\n",
       "      <td>1801.000000</td>\n",
       "      <td>306.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>23.000000</td>\n",
       "      <td>1801.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>16.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>9664.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>8 rows × 105 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           event_id       user_id          lat          lng          c_1  \\\n",
       "count  8.846000e+03  8.846000e+03  5290.000000  5290.000000  8846.000000   \n",
       "mean   2.142609e+09  2.128939e+09    24.694942   -20.339798     2.406285   \n",
       "std    1.234082e+09  1.260700e+09    21.430399    93.259494    23.632890   \n",
       "min    1.040700e+05  1.329876e+06   -50.333000  -157.863000     0.000000   \n",
       "25%    1.088448e+09  1.013605e+09     3.578000   -95.304750     0.000000   \n",
       "50%    2.119257e+09  2.129289e+09    33.976500   -74.213000     1.000000   \n",
       "75%    3.212027e+09  3.221692e+09    42.280500    98.688750     3.000000   \n",
       "max    4.294677e+09  4.294033e+09    60.226000   174.777000  2186.000000   \n",
       "\n",
       "               c_2          c_3          c_4          c_5          c_6  \\\n",
       "count  8846.000000  8846.000000  8846.000000  8846.000000  8846.000000   \n",
       "mean      1.398372     1.279674     0.863780     1.199186     2.596428   \n",
       "std       2.949282     2.764838     2.021167    19.272932     7.697351   \n",
       "min       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "25%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "50%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "75%       2.000000     2.000000     1.000000     1.000000     2.000000   \n",
       "max      82.000000    85.000000    71.000000  1801.000000   306.000000   \n",
       "\n",
       "          ...              c_92         c_93         c_94         c_95  \\\n",
       "count     ...       8846.000000  8846.000000  8846.000000  8846.000000   \n",
       "mean      ...          0.062288     0.082071     0.093828     0.069184   \n",
       "std       ...          0.304606     0.381958     0.399565     0.317505   \n",
       "min       ...          0.000000     0.000000     0.000000     0.000000   \n",
       "25%       ...          0.000000     0.000000     0.000000     0.000000   \n",
       "50%       ...          0.000000     0.000000     0.000000     0.000000   \n",
       "75%       ...          0.000000     0.000000     0.000000     0.000000   \n",
       "max       ...          6.000000     9.000000    10.000000     9.000000   \n",
       "\n",
       "              c_96         c_97         c_98         c_99        c_100  \\\n",
       "count  8846.000000  8846.000000  8846.000000  8846.000000  8846.000000   \n",
       "mean      0.078793     0.302736     0.079810     0.081054     0.072914   \n",
       "std       0.525019    19.153367     0.359582     0.477520     0.341243   \n",
       "min       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "25%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "50%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "75%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "max      23.000000  1801.000000     9.000000    16.000000     7.000000   \n",
       "\n",
       "           c_other  \n",
       "count  8846.000000  \n",
       "mean     58.926407  \n",
       "std     128.470916  \n",
       "min       0.000000  \n",
       "25%      14.000000  \n",
       "50%      39.000000  \n",
       "75%      75.000000  \n",
       "max    9664.000000  \n",
       "\n",
       "[8 rows x 105 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "train = pd.read_csv('./train_events.csv',index_col=0)\n",
    "train.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 仅保留活动关键词"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>c_1</th>\n",
       "      <th>c_2</th>\n",
       "      <th>c_3</th>\n",
       "      <th>c_4</th>\n",
       "      <th>c_5</th>\n",
       "      <th>c_6</th>\n",
       "      <th>c_7</th>\n",
       "      <th>c_8</th>\n",
       "      <th>c_9</th>\n",
       "      <th>c_10</th>\n",
       "      <th>...</th>\n",
       "      <th>c_92</th>\n",
       "      <th>c_93</th>\n",
       "      <th>c_94</th>\n",
       "      <th>c_95</th>\n",
       "      <th>c_96</th>\n",
       "      <th>c_97</th>\n",
       "      <th>c_98</th>\n",
       "      <th>c_99</th>\n",
       "      <th>c_100</th>\n",
       "      <th>c_other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "      <td>8846.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>2.406285</td>\n",
       "      <td>1.398372</td>\n",
       "      <td>1.279674</td>\n",
       "      <td>0.863780</td>\n",
       "      <td>1.199186</td>\n",
       "      <td>2.596428</td>\n",
       "      <td>1.055505</td>\n",
       "      <td>0.569749</td>\n",
       "      <td>0.649220</td>\n",
       "      <td>0.540244</td>\n",
       "      <td>...</td>\n",
       "      <td>0.062288</td>\n",
       "      <td>0.082071</td>\n",
       "      <td>0.093828</td>\n",
       "      <td>0.069184</td>\n",
       "      <td>0.078793</td>\n",
       "      <td>0.302736</td>\n",
       "      <td>0.079810</td>\n",
       "      <td>0.081054</td>\n",
       "      <td>0.072914</td>\n",
       "      <td>58.926407</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>23.632890</td>\n",
       "      <td>2.949282</td>\n",
       "      <td>2.764838</td>\n",
       "      <td>2.021167</td>\n",
       "      <td>19.272932</td>\n",
       "      <td>7.697351</td>\n",
       "      <td>22.600074</td>\n",
       "      <td>1.406054</td>\n",
       "      <td>1.668331</td>\n",
       "      <td>1.281568</td>\n",
       "      <td>...</td>\n",
       "      <td>0.304606</td>\n",
       "      <td>0.381958</td>\n",
       "      <td>0.399565</td>\n",
       "      <td>0.317505</td>\n",
       "      <td>0.525019</td>\n",
       "      <td>19.153367</td>\n",
       "      <td>0.359582</td>\n",
       "      <td>0.477520</td>\n",
       "      <td>0.341243</td>\n",
       "      <td>128.470916</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>14.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>39.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>75.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>2186.000000</td>\n",
       "      <td>82.000000</td>\n",
       "      <td>85.000000</td>\n",
       "      <td>71.000000</td>\n",
       "      <td>1801.000000</td>\n",
       "      <td>306.000000</td>\n",
       "      <td>2120.000000</td>\n",
       "      <td>23.000000</td>\n",
       "      <td>51.000000</td>\n",
       "      <td>51.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>23.000000</td>\n",
       "      <td>1801.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>16.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>9664.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>8 rows × 101 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               c_1          c_2          c_3          c_4          c_5  \\\n",
       "count  8846.000000  8846.000000  8846.000000  8846.000000  8846.000000   \n",
       "mean      2.406285     1.398372     1.279674     0.863780     1.199186   \n",
       "std      23.632890     2.949282     2.764838     2.021167    19.272932   \n",
       "min       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "25%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "50%       1.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "75%       3.000000     2.000000     2.000000     1.000000     1.000000   \n",
       "max    2186.000000    82.000000    85.000000    71.000000  1801.000000   \n",
       "\n",
       "               c_6          c_7          c_8          c_9         c_10  \\\n",
       "count  8846.000000  8846.000000  8846.000000  8846.000000  8846.000000   \n",
       "mean      2.596428     1.055505     0.569749     0.649220     0.540244   \n",
       "std       7.697351    22.600074     1.406054     1.668331     1.281568   \n",
       "min       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "25%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "50%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "75%       2.000000     1.000000     1.000000     1.000000     1.000000   \n",
       "max     306.000000  2120.000000    23.000000    51.000000    51.000000   \n",
       "\n",
       "          ...              c_92         c_93         c_94         c_95  \\\n",
       "count     ...       8846.000000  8846.000000  8846.000000  8846.000000   \n",
       "mean      ...          0.062288     0.082071     0.093828     0.069184   \n",
       "std       ...          0.304606     0.381958     0.399565     0.317505   \n",
       "min       ...          0.000000     0.000000     0.000000     0.000000   \n",
       "25%       ...          0.000000     0.000000     0.000000     0.000000   \n",
       "50%       ...          0.000000     0.000000     0.000000     0.000000   \n",
       "75%       ...          0.000000     0.000000     0.000000     0.000000   \n",
       "max       ...          6.000000     9.000000    10.000000     9.000000   \n",
       "\n",
       "              c_96         c_97         c_98         c_99        c_100  \\\n",
       "count  8846.000000  8846.000000  8846.000000  8846.000000  8846.000000   \n",
       "mean      0.078793     0.302736     0.079810     0.081054     0.072914   \n",
       "std       0.525019    19.153367     0.359582     0.477520     0.341243   \n",
       "min       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "25%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "50%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "75%       0.000000     0.000000     0.000000     0.000000     0.000000   \n",
       "max      23.000000  1801.000000     9.000000    16.000000     7.000000   \n",
       "\n",
       "           c_other  \n",
       "count  8846.000000  \n",
       "mean     58.926407  \n",
       "std     128.470916  \n",
       "min       0.000000  \n",
       "25%      14.000000  \n",
       "50%      39.000000  \n",
       "75%      75.000000  \n",
       "max    9664.000000  \n",
       "\n",
       "[8 rows x 101 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "labels = [\"event_id\",\"user_id\",\"start_time\",\"city\",\"state\",\"zip\",\"country\",\"lat\",\"lng\"]\n",
    "X_train = train.drop(labels,axis=1)\n",
    "\n",
    "X_train.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 8846 entries, 2 to 3137701\n",
      "Columns: 101 entries, c_1 to c_other\n",
      "dtypes: int64(101)\n",
      "memory usage: 6.9 MB\n"
     ]
    }
   ],
   "source": [
    "X_train.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 聚类"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 10\n",
      "CH_score: 0.366749186815, time elaps:3\n",
      "K-means begin with clusters: 20\n",
      "CH_score: 0.325254042585, time elaps:3\n",
      "K-means begin with clusters: 30\n",
      "CH_score: 0.213624855063, time elaps:2\n",
      "K-means begin with clusters: 40\n",
      "CH_score: 0.148283274699, time elaps:2\n",
      "K-means begin with clusters: 50\n",
      "CH_score: 0.134116227355, time elaps:2\n",
      "K-means begin with clusters: 60\n",
      "CH_score: 0.156504584364, time elaps:2\n",
      "K-means begin with clusters: 70\n",
      "CH_score: 0.119261605271, time elaps:2\n",
      "K-means begin with clusters: 80\n",
      "CH_score: 0.118241005309, time elaps:2\n",
      "K-means begin with clusters: 90\n",
      "CH_score: 0.0782689177958, time elaps:3\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x114a878d0>]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD8CAYAAABw1c+bAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHxlJREFUeJzt3Xl4VOXdxvHvDyhVcEGBqhUUKRSMCBEiKqCyaUGtaAUF\n96VF3oobVURFUrW2rohaUREVWxWkrVTUt1rcXisumCggFFFcARciCoi1LPJ7/3gmZUyjmYTJnDNz\n7s915UpmOcmdoPc885zlMXdHRESSo0HUAUREJLdU/CIiCaPiFxFJGBW/iEjCqPhFRBJGxS8ikjAq\nfhGRhFHxi4gkjIpfRCRhGkUdoDotWrTwNm3aRB1DRCRvlJeXf+ruLTN5biyLv02bNpSVlUUdQ0Qk\nb5jZ+5k+V1M9IiIJo+IXEUkYFb+ISMKo+EVEEkbFLyKSMCp+EZGEUfGLiCRMQRX/lVfCq69GnUJE\nJN4Kpvg/+wwmTYIDDoBbbwUtJSwiUr2CKf4dd4TXXoP+/WHkSDj2WFi9OupUIiLxUzDFD9CiBTzy\nCFx7LcyYAV27gq78ICLyTQVV/AANGsCFF8Jzz8GGDdCjB9xyi6Z+REQqFVzxV+rRI0z9DBgA55wD\ngwfDqlVRpxIRiV7BFj9A8+bw8MNwww0wcybssw/MmRN1KhGRaBV08QOYwahR8PzzYbqnVy+YMEFT\nPyKSXAVf/JX22y9M/Rx2GJx/Phx9dDgEVEQkaTIqfjMbYGaLzWyJmY2p5vFBZjbfzOaaWZmZ9Up7\n7D0ze73ysWyGr60ddghH+0yYAP/7v2Hq56WXokwkIpJ7NRa/mTUEbgUGAkXAMDMrqvK0p4Au7l4M\nnA5MrvJ4H3cvdveSLGTeImZw7rkwe3Y4AujAA8M+AE39iEhSZDLi7w4scfd33H09MA0YlP4Ed1/r\n/p/qbArEvkb33TdM/Rx5JFxwQfi8cmXUqURE6l8mxb8rsDTt9rLUfd9gZkeb2RvAY4RRfyUHnjSz\ncjMbviVhs61ZM/jzn+Hmm+Hvfw9TPy+8EHUqEZH6lbWdu+4+w907AkcBV6Y91Cs1BTQQOMvMDqpu\nezMbnto/UFZRUZGtWDUyg7PPDoX/ve/BQQfBddfBpk05iyAiklOZFP9yoHXa7Vap+6rl7s8Bbc2s\nRer28tTnFcAMwtRRddtNcvcSdy9p2bJlhvGzp1u3cGXPo4+G0aPhpz+FTz/NeQwRkXqXSfG/ArQ3\nsz3MrDEwFJiZ/gQza2dmlvq6K/B9YKWZNTWzbVP3NwUOBRZk8xfIpu23h+nTw9U9n3wSiovD8f8i\nIoWkxuJ3943ASOAJYBEw3d0XmtkIMxuRetoxwAIzm0s4Aui41M7enYDnzWweMAd4zN0fr49fJFvM\n4Je/hBdfhK22gt694eqrNfUjIoXDPIbHMZaUlHhZDC6ruWYNDB8ODz4Yrvnzhz9ABLNQIiI1MrPy\nTA+ZT8yZu3Wx3XYwdSrcdhs880yY+nnuuahTiYhsGRV/DcxgxIhwhm/TptCnD1x1laZ+RCR/qfgz\nVFwM5eVw3HEwdmyY+lmxIupUIiK1p+KvhW23hfvvD2v7/uMf4cXg2WejTiUiUjsq/loyg1/8Al5+\nOewD6NcPrrgCvv466mQiIplR8ddR585hPd/jj4fSUvjJT+Djj6NOJSJSMxX/Fthmm3CI5113hUs+\nFBfDU09FnUpE5Lup+LeQGZx+eljScccd4ZBDwjsATf2ISFyp+LOkUyd45RU4+eQw59+/P3z0UdSp\nRET+m4o/i5o2hSlT4J57wjuA4mKYNSvqVCIi36TirwennhpG/y1ahJ2+Y8fCxo1RpxIRCVT89aSo\nKJT/aaeFM31HjYo6kYhIoOKvR02ahCN+fv5zuP12WLq05m1EROqbij8Hxo4Nn6++OtocIiKg4s+J\n3XcPUz6TJ8OyZVGnEZGkU/HnyMUXhyt6atQvIlFT8edImzbhaJ8774Tl37pisYhI/VPx59All2jU\nLyLRU/Hn0B57wCmnhFH/hx9GnUZEkkrFn2OXXBJO5rrmmqiTiEhSqfhzrG3bMOq/4w6N+kUkGir+\nCFx6aRj1X3tt1ElEJIlU/BFo2xZOOimM+nUFTxHJNRV/RC69FDZs0KhfRHJPxR+Rdu3gxBPDNXy0\nZKOI5FJGxW9mA8xssZktMbMx1Tw+yMzmm9lcMyszs16Zbptkl14K69fDdddFnUREkqTG4jezhsCt\nwECgCBhmZkVVnvYU0MXdi4HTgcm12Dax2rcPo/7bboNPPok6jYgkRSYj/u7AEnd/x93XA9OAQelP\ncPe17u6pm00Bz3TbpBs7Ftat06hfRHInk+LfFUi/kvyy1H3fYGZHm9kbwGOEUX/G2yZZ+/Zw/PEw\ncSKsWBF1GhFJgqzt3HX3Ge7eETgKuLK225vZ8NT+gbKKiopsxcoLGvWLSC5lUvzLgdZpt1ul7quW\nuz8HtDWzFrXZ1t0nuXuJu5e0bNkyg1iFo0MHGDZMo34RyY1Miv8VoL2Z7WFmjYGhwMz0J5hZOzOz\n1Nddge8DKzPZVoKxY+Grr+CGG6JOIiKFrsbid/eNwEjgCWARMN3dF5rZCDMbkXraMcACM5tLOIrn\nOA+q3bY+fpF817FjGPX//veQsJkuEckx23wwTnyUlJR4WVlZ1DFybtEi2GsvGD1a1+wXkdoxs3J3\nL8nkuTpzN0b23BOOOy6M+j/9NOo0IlKoVPwxc9ll8K9/aa5fROqPij9miorg2GPDqH/lyqjTiEgh\nUvHH0GWXwZdfwvjxUScRkUKk4o+hvfaCIUPg5ps16heR7FPxx9Rll8HatXDjjVEnEZFCo+KPqU6d\nYPDgMOr/7LOo04hIIVHxx9i4cfDFFxr1i0h2qfhjbO+94Zhjwqj/88+jTiMihULFH3PjxsGaNTBh\nQtRJRKRQqPhjrnNn+NnPQvFr1C8i2aDizwOVo/6bboo6iYgUAhV/HujSBY46Koz6V62KOo2I5DsV\nf54YNw5Wr9aoX0S2nIo/T+yzDwwaFEb9q1dHnUZE8pmKP4+MGxemem6+OeokIpLPVPx5pGtXOPLI\ncPE2jfpFpK5U/HmmctR/yy1RJxGRfKXizzPdusERR4RR/5o1UacRkXyk4s9DpaXhZC6N+kWkLlT8\neaikBA4/PIz6v/gi6jQikm9U/HmqtDRcrvn3v486iYjkGxV/ntp3XzjsMLj+eo36RaR2VPx5rHLU\nf+utUScRkXyi4s9j3bvDgAFh1L92bdRpRCRfqPjzXGlpWJBdo34RyVRGxW9mA8xssZktMbMx1Tx+\ngpnNN7PXzewFM+uS9th7qfvnmllZNsML7L8//OQnGvWLSOZqLH4zawjcCgwEioBhZlZU5WnvAge7\n+97AlcCkKo/3cfdidy/JQmaporQUPv0Ubrst6iQikg8yGfF3B5a4+zvuvh6YBgxKf4K7v+DuletD\nvQS0ym5M+S4HHACHHgrXXQdffhl1GhGJu0yKf1dgadrtZan7vs0ZwN/SbjvwpJmVm9nwb9vIzIab\nWZmZlVVUVGQQS9KVlkJFhUb9IlKzrO7cNbM+hOK/KO3uXu5eTJgqOsvMDqpuW3ef5O4l7l7SsmXL\nbMZKhB49oH9/jfpFpGaZFP9yoHXa7Vap+77BzDoDk4FB7r6y8n53X576vAKYQZg6knpQWgorVsDt\nt0edRETiLJPifwVob2Z7mFljYCgwM/0JZrYb8BBwkru/mXZ/UzPbtvJr4FBgQbbCyzf16gX9+sG1\n18K//hV1GhGJqxqL3903AiOBJ4BFwHR3X2hmI8xsROpp44DmwMQqh23uBDxvZvOAOcBj7v541n8L\n+Y/KUf8dd0SdRETiytw96gz/paSkxMvKdMh/XfXrBwsXwjvvQJMmUacRkVwws/JMD5nXmbsFqLQU\nPvkEJlU9m0JEBBV/QTroIOjdG665Br76Kuo0IhI3Kv4CVVoKH3+sUb+I/DcVf4Hq3RsOPjiM+v/9\n76jTiEicqPgLWGkpfPQR3Hln1ElEJE5U/AWsd+8w33/11Rr1i8hmKv4CZhZG/R9+CJMnR51GROJC\nxV/g+vQJZ/Rq1C8ilVT8Bc4Mfv1rWL4c7ror6jQiEgcq/gTo2xd69gyj/nXrok4jIlFT8SdA5Vz/\nsmVw991RpxGRqKn4E6J//3DN/t/+VqN+kaRT8SdE+qj/nnuiTiMiUVLxJ8ghh8D++4dR//r1UacR\nkaio+BOk8gifpUs16hdJMhV/whx6KOy3n0b9Ikmm4k+Yyrn+Dz6Ae++NOo2IREHFn0ADBkD37nDV\nVRr1iySRij+BKkf977+vuX6RJFLxJ9TAgXDggTBqFLz6atRpRCSXVPwJZQbTp0OLFvDTn4bj+0Uk\nGVT8CbbzzvDoo/DFF6H8166NOpGI5IKKP+H23hsefBDmz4cTToCvv446kYjUNxW/MHAg3HQTzJwJ\no0dHnUZE6lujqANIPIwcCW++CePHw49/DGeeGXUiEakvGY34zWyAmS02syVmNqaax08ws/lm9rqZ\nvWBmXTLdVuJj/Pgw+j/rLJg1K+o0IlJfaix+M2sI3AoMBIqAYWZWVOVp7wIHu/vewJXApFpsKzHR\nqBFMmwZFRTB4MPzzn1EnEpH6kMmIvzuwxN3fcff1wDRgUPoT3P0Fd/88dfMloFWm20q8bLddONJn\n663hiCNgxYqoE4lItmVS/LsCS9NuL0vd923OAP5Wx20lBnbbLezo/egjOOooLdIuUmiyelSPmfUh\nFP9Fddh2uJmVmVlZRUVFNmNJHXTvDn/8I7z4Ipx+OrhHnUhEsiWT4l8OtE673Sp13zeYWWdgMjDI\n3VfWZlsAd5/k7iXuXtKyZctMsks9Gzw4XL556lS4/PKo04hItmRS/K8A7c1sDzNrDAwFZqY/wcx2\nAx4CTnL3N2uzrcTbmDFw6qmh+O+/P+o0IpINNR7H7+4bzWwk8ATQELjb3Rea2YjU47cD44DmwEQz\nA9iYGr1Xu209/S5SD8zgjjvg3XfDlE+bNtCzZ9SpRGRLmMdw8rakpMTLysqijiFpVq6EAw6Azz+H\nl1+Gtm2jTiQi6cys3N1LMnmuLtkgGWneHB57LFzL5/DDYdWqqBOJSF2p+CVj7dvDjBnw9tswZAhs\n2BB1IhGpCxW/1MrBB8OkSfDkk+H6PjGcKRSRGugibVJrp54aLuj2u99Bhw5hFS8RyR8qfqmT3/wG\n3noLLrgAfvQjGKQLcYjkDU31SJ00aAD33gslJXD88Vq3VySfqPilzpo0Cdf0ad48LN24vNpzskUk\nblT8skUq1+1ds0br9orkCxW/bLHOncO6vfPmad1ekXyg4pesOOwwmDAhTP1cVOtrs4pILumoHsma\ns88Oh3necENYt3f48KgTiUh1VPySVTfeGM7s/eUvw/V8+vePOpGIVKWpHsmqynV799wzXM9/0aKo\nE4lIVSp+ybrKdXu32ipc0E0LqonEi4pf6sXuu8PDD4d1e48+Wuv2isSJil/qzX77wR/+ALNnwxln\n6IJuInGh4pd6NWQIXHUVPPAAXHll1GlEBHRUj+TAxReHwzxLS8M1/YcNizqRSLJpxC/1rnLd3oMO\ngtNOgxdeiDqRSLKp+CUnvv99eOghaN0ajjoqLN4uItFQ8UvONG8eDvPcuFHr9opEScUvOdWhQxj5\nv/UWHHus1u0ViYKKX3Kud++wbu+sWeH6PjrMUyS3dFSPROK008KRPldfHd4FnH9+1Iny09dfh4Xv\nt90WevSIOo3kC434JTJXXQXHHAO/+hU88kjUafLLhx+G8yLatoUBA6BnT/jFL8KCOCI1yaj4zWyA\nmS02syVmNqaaxzua2Ytmts7MLqjy2Htm9rqZzTWzsmwFl/zXoEE4s7dbt3Bs/9y5USeKt02b4Ikn\nwiUwdtsNxo0Ll7+ePj2sgXD33dCpU5hCE/kuNRa/mTUEbgUGAkXAMDMrqvK0z4BzgOu/5dv0cfdi\ndy/ZkrBSeCrX7d1xRzjiiDCSlW/65BP43e+gXbswup89O7xLeuutUPJDhoQps9mzw9/z0ENhxAj4\n4ouok0tcZTLi7w4scfd33H09MA0YlP4Ed1/h7q8AOkZDam2XXcJUz+rVYd3eL7+MOlH0Nm2Cp54K\npd6qFVxySbjw3bRpsHQpXHNNeCFIt//+8NprcMEFYef53nvD009Hk1/iLZPi3xVYmnZ7Weq+TDnw\npJmVm5nWZJJqdekSSm3uXDjxxFB8SVRRAdddF3Z49+8fivucc+CNN+CZZ+C448LJcN9m663D9v/4\nBzRuDP36wciRsHZt7n4Hib9c7Nzt5e7FhKmis8zsoOqeZGbDzazMzMoqdAH3RDr88LCC11//Cocc\nEtbwnT+/8F8E3OH//i/s52jVCkaPhp13hvvug+XLw1KWHTrU7nv27BleRM8/HyZOhM6dw88QgcyK\nfznQOu12q9R9GXH35anPK4AZhKmj6p43yd1L3L2kZcuWmX57KTBnnx3ms99/P5RWly7wgx+E1bwm\nTgwrehXKcf+ffRZe6IqKwrkNjz8e5uYXLAgj9hNOCIvZ1FWTJjB+fCj8Bg3Czzj3XE2lSWbF/wrQ\n3sz2MLPGwFBgZibf3Myamtm2lV8DhwIL6hpWCp8ZjBkDS5aE8p8yJez0nTMHzjorlOQPfwjHHw93\n3hmel08vBO5hJ+xJJ4XfY9QoaNYs/J7Ll8NNN8Fee2X3Zx54IMybF15Ub745vJg+/3x2f4bkF/MM\n/q8xs8OACUBD4G53v8rMRgC4++1mtjNQBmwHbALWEo4AakEY5UM4WewBd7+qpp9XUlLiZWU68lM2\nc4d33gnz3E8/HT5//HF4rHVr6NNn88fuu0ebtTqrVsEf/xiuUrpwYTjh6qST4MwzwzRMrjz7LJx+\nOrz3Hpx3HvzmN+GdgeQ/MyvP9MjJjIo/11T8UhN3WLx484vAs8/Cp5+Gx9q2DS8AffuGz7vsEl3G\nOXPg9tvhwQfhq6+gpCSU/dChsM020eRauzYc9z9xYlgfYcoUnfVbCFT8kjibNoW58Wee2fxCsHp1\neKxDh80vBL17Q33vQlqzBu6/P4zu582Dpk3DfP2ZZ0LXrvX7s2vj6afD6P+DD8J5AVdcEY4Kkvyk\n4pfE+/rrcFRL5QvBc89tPqSxU6fN7wYOPhh22CE7P7O8PJT9Aw+EHajFxaHsjz8ettsuOz8j2774\nAi68MOTu2DGM/vfbL+pUUhcqfpEqNmwIxVy5j2D27DD1Ygb77LN5/8CBB9aupNeuhalTQ3GWl4cR\n89Ch4eicffcN3z8fzJoFZ5wRdjBfeCH8+tdbdkSR5J6KX6QG69aF+ffKfQQvvgjr10PDhmEevnJq\nqGfP6nd+zpsXyv6++8KouVOnMLo/8cRwlE4+Wr06nPU7eXI4emrKlPDiJflBxS9SS199FdYCrpwa\nmjMnrBT2ve+FqY/KdwTvvRcK/+WXwxm0xx4bRvcHHJA/o/uaPP44/Pzn4aipiy4KF4P7rrOFJR5U\n/CJbaO3acKx75QtBefnmM4g7dgyj+5NPDheXK0SrVoVzDO65J7ybuffeeO2Ylv+m4hfJslWrwgtB\ns2Zh+qdQRvc1eeyxcJ3/FSvg0kvDR+PGUaeS6tSm+LUQi0gGmjULZxD36pWc0odw/aSFC8PhqFdc\nAd27a92EQqDiF5HvtMMOYarn4YfD2gD77hteBDboIux5S8UvIhk58shwktyxx0Jpadjp/frrUaeS\nulDxi0jGmjcPZyU/9FA45r9bt7B28saNUSeT2lDxi0itHX10mPs/5hgYOzas/rVwYdSpJFMqfhGp\nkxYtwlnLf/pTuIR2165h7V+N/uNPxS8iW2Tw4DDaP/JIuPjicLjrokVRp5LvouIXkS32gx+Ekf+D\nD8Lbb4frH113XbhYnsSPTuASkaz65BP4n/+BGTPC+Q9bbw2NGoXLX1R+pN/O9LFsfo927aJbp6G+\n1OYErkb1HUZEkmWnneAvfwkfTz8djvffuDF8rvxIv71xY7hW0po1NT8v/bEtGbNuvTVMmBDOSk7S\nCXmVVPwiknVmYe5/8OD6+xmbNmX2AlH1sXXr4Prrw/WWHn88rN3cvHn95YwjFb+I5KUGDcJVQ+ty\n5dD+/WH8eLjkkrD4/H33hdXZkkI7d0UkcRo0CGsPvPhiWG+hb99wAbqkXIZCxS8iidWtG7z6alh7\n+Le/DRfhe/vtqFPVPxW/iCTaNtuEVccefBAWLw6Hot53X9Sp6peKX0SEcPG5efPCnP9JJ4VlNNes\niTpV/VDxi4ik7L57WHHt8svD5SiKi+Gll6JOlX0qfhGRNI0ahXWGn3suHDLaq1e4AmkhnYWcUfGb\n2QAzW2xmS8xsTDWPdzSzF81snZldUJttRUTiqGfPsNrYkCHhCqT9+sHSpVGnyo4ai9/MGgK3AgOB\nImCYmRVVedpnwDnA9XXYVkQklpo1gwcegClToKwszP8/9FDUqbZcJiP+7sASd3/H3dcD04BB6U9w\n9xXu/gpQ9SjYGrcVEYkzMzjlFHjtNfjRj8IaBMOHw5dfRp2s7jIp/l2B9Dc4y1L3ZWJLthURiY32\n7WH2bLjoonD4Z0lJ/i48H5udu2Y23MzKzKysoqIi6jgiIv+lceOw2MysWbB6dVh3+MYbw07gfJJJ\n8S8HWqfdbpW6LxMZb+vuk9y9xN1LWrZsmeG3FxHJvX79YP58GDAARo2Cww8Pl6POF5kU/ytAezPb\nw8waA0OBmRl+/y3ZVkQktlq0gL/+FSZOhGefhc6d4W9/izpVZmosfnffCIwEngAWAdPdfaGZjTCz\nEQBmtrOZLQNGAWPNbJmZbfdt29bXLyMikktmYdGZsrKwDsFhh8F554VLP8eZVuASEcmCf/8bRo+G\nW24Jh31OnQp77pm7n1+bFbhis3NXRCSfbbUV3HwzPPIILF8ervx5xx1btlJYfVHxi4hk0RFHhB2/\nvXrBiBHhuP+VK6NO9U0qfhGRLNtll7Cs4/XXw6OPhqmfZ56JOtVmKn4RkXrQoAH86lfh6p5Nm4ZD\nQOOyypeKX0SkHnXtCuXl8VrlS8UvIlLPKlf5mj4d3nwz+lW+VPwiIjkyZEhY5au4ONpVvlT8IiI5\ntNtuYUfvFVfAtGnRrPKl4hcRybGGDeGyy8IqX+65X+VLxS8iEpEePb65ylffvrB2bf3/XBW/iEiE\ntt8+rPJ1773Qrl049LO+Nar/HyEiIt/FDE4+OXzkgkb8IiIJo+IXEUkYFb+ISMKo+EVEEkbFLyKS\nMCp+EZGEUfGLiCSMil9EJGFiudi6mVUA79dx8xbAp1mMky3KVTvKVTvKVTuFmGt3d2+ZyRNjWfxb\nwszKMl1pPpeUq3aUq3aUq3aSnktTPSIiCaPiFxFJmEIs/klRB/gWylU7ylU7ylU7ic5VcHP8IiLy\n3QpxxC8iIt8hr4vfzO42sxVmtiDtvh3NbJaZvZX6vEOOM7U2s2fM7J9mttDMzo1Jrq3MbI6ZzUvl\nujwOudLyNTSz18zs0bjkMrP3zOx1M5trZmUxytXMzP5sZm+Y2SIzOyAmuTqk/laVH2vM7Lyos5nZ\n+an/5heY2dTU/wtx+Hudm8q00MzOS92Xk1x5XfzAFGBAlfvGAE+5e3vgqdTtXNoI/Mrdi4D9gbPM\nrCgGudYBfd29C1AMDDCz/WOQq9K5wKK023HJ1cfdi9MOsYtDrpuAx929I9CF8HeLPJe7L079rYqB\nbsC/gBlRZjOzXYFzgBJ37wQ0BIZGmSmVqxPwC6A74d/wCDNrl7Nc7p7XH0AbYEHa7cXALqmvdwEW\nR5zvYeCQOOUCmgCvAvvFIRfQKvUfeV/g0bj8OwLvAS2q3BdpLmB74F1S++fikquanIcCs6POBuwK\nLAV2JKw4+GgqW9T/jkOAu9JuXwaMzlWufB/xV2cnd/8o9fXHwE5RBTGzNsA+wMvEIFdqOmUusAKY\n5e6xyAVMIPxHvyntvjjkcuBJMys3s+ExybUHUAHck5oam2xmTWOQq6qhwNTU15Flc/flwPXAB8BH\nwGp3/3uUmVIWAAeaWXMzawIcBrTOVa5CLP7/8PCyGclhS2a2DfAX4Dx3XxOHXO7+tYe34a2A7qm3\nm5HmMrMjgBXuXv5tz4nw37FX6u81kDBld1AMcjUCugK3ufs+wJdUmQ6I8r97ADNrDBwJ/KnqY7nO\nlpojH0R4wfwh0NTMTowyU+pnLgKuAf4OPA7MBb7OVa5CLP5PzGwXgNTnFbkOYGbfI5T+/e7+UFxy\nVXL3VcAzhP0jUefqCRxpZu8B04C+ZnZfDHJVjhZx9xWEueruMci1DFiWercG8GfCC0HUudINBF51\n909St6PM1h94190r3H0D8BDQI+JMALj7Xe7ezd0PAj4H3sxVrkIs/pnAKamvTyHMseeMmRlwF7DI\n3cfHKFdLM2uW+nprwn6HN6LO5e4Xu3srd29DmB542t1PjDqXmTU1s20rvybMCy+IOpe7fwwsNbMO\nqbv6Af+MOlcVw9g8zQPRZvsA2N/MmqT+3+xH2Bke+d/LzH6Q+rwb8DPggZzlyuUOjXrYQTKVMG+3\ngTASOgNoTthR+BbwJLBjjjP1Irw9m094+zaXMH8Xda7OwGupXAuAcan7I81VJWNvNu/cjfrv1RaY\nl/pYCFwah1ypDMVAWerf8q/ADnHIlcrWFFgJbJ92X9T/lpcTBjkLgD8C3486UyrXPwgv2vOAfrn8\nW+nMXRGRhCnEqR4REfkOKn4RkYRR8YuIJIyKX0QkYVT8IiIJo+IXEUkYFb+ISMKo+EVEEub/AU95\n/cs1ndzUAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x11d5cec10>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sklearn.cluster import MiniBatchKMeans\n",
    "\n",
    "from sklearn import metrics\n",
    "\n",
    "import time\n",
    "\n",
    "def K_cluster_analysis(K, X_train):\n",
    "    start = time.time()\n",
    "\n",
    "    print(\"K-means begin with clusters: {}\".format(K));\n",
    "\n",
    "    # K-means,在训练集上训练\n",
    "    mb_kmeans = MiniBatchKMeans(n_clusters=K)\n",
    "    mb_kmeans.fit(X_train)\n",
    "    \n",
    "    # K值的评估标准\n",
    "    # 常见的方法有轮廓系数Silhouette Coefficient和Calinski-Harabasz Index\n",
    "    # 这两个分数值越大则聚类效果越好\n",
    "    # CH_score = metrics.calinski_harabaz_score(X_train,mb_kmeans.predict(X_train))\n",
    "    CH_score = metrics.silhouette_score(X_train, mb_kmeans.predict(X_train))\n",
    "    end = time.time()\n",
    "    print(\"CH_score: {}, time elaps:{}\".format(CH_score, int(end - start)))\n",
    "    return CH_score\n",
    "\n",
    "\n",
    "# Ks = [10,20]\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "Ks = np.arange(10,100,10)\n",
    "\n",
    "CH_scores = []\n",
    "\n",
    "for K in Ks:\n",
    "    ch = K_cluster_analysis(K, X_train)\n",
    "    CH_scores.append(ch)\n",
    "    \n",
    "\n",
    "plt.plot(Ks, np.array(CH_scores), 'b-')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import sklearn.preprocessing as preprocessing\n",
    "# min_max_scaler = preprocessing.MinMaxScaler()\n",
    "# X_train_2 =  min_max_scaler.fit_transform(X_train)\n",
    "# 尝试做一下数据标准化\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "ss = StandardScaler()\n",
    "\n",
    "X_train_array =  ss.fit_transform(X_train)\n",
    "X_train_2 = pd.DataFrame(data=X_train_array, columns=X_train.columns)\n",
    "X_train_2.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 10\n",
      "CH_score: 0.223977847589, time elaps:3\n",
      "K-means begin with clusters: 20\n",
      "CH_score: -0.0352337673551, time elaps:3\n",
      "K-means begin with clusters: 30\n",
      "CH_score: -0.0381663216807, time elaps:3\n",
      "K-means begin with clusters: 40\n",
      "CH_score: -0.148952838759, time elaps:3\n",
      "K-means begin with clusters: 50\n",
      "CH_score: -0.106515187159, time elaps:3\n",
      "K-means begin with clusters: 60\n",
      "CH_score: -0.226302569767, time elaps:3\n",
      "K-means begin with clusters: 70\n",
      "CH_score: -0.0895294288746, time elaps:3\n",
      "K-means begin with clusters: 80\n",
      "CH_score: -0.13460295405, time elaps:2\n",
      "K-means begin with clusters: 90\n",
      "CH_score: -0.131254832852, time elaps:3\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x114d3efd0>]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD8CAYAAAB+UHOxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHlRJREFUeJzt3Xl8VPW5x/HPQ1jV64IigqAgAhJBFiPgUrVqe91usdVE\nrAsuLcWluFaxWnu9VouteiuIgku14sJVUXFBBVHqSjUoIIoIilUoSFQQRZZAnvvHb1IiJmSZyZwz\nc77v12temRlO5jxkmW/O8zu/3zF3R0REkqdJ1AWIiEg0FAAiIgmlABARSSgFgIhIQikAREQSSgEg\nIpJQCgARkYRSAIiIJJQCQEQkoZpGXcCW7LTTTt6pU6eoyxARyRkzZ8783N3b1GXbWAdAp06dKC0t\njboMEZGcYWb/rOu2agGJiCSUAkBEJKEUACIiCaUAEBFJKAWAiEhCKQBERBJKASAiklB5FwBr1sAN\nN8C0aVFXIiISb3kXAM2bhwAYOzbqSkRE4i3vAqCgAI4/Hp5+GlavjroaEZH4yrsAACgpCa2gp5+O\nuhIRkfjKywA46CDYZRd46KGoKxERia+8DIDKNtDkyfDNN1FXIyIST3kZAKA2kIhIbfI2AA48UG0g\nEZEtyUgAmNmRZjbfzBaa2Yhq/v1kM5tjZu+Y2Wtm1jsT+92SggI44QS1gUREapJ2AJhZATAGOAoo\nBE4ys8LNNlsEHOLuvYBrgNvT3W9dlJTA2rXw1FPZ2JuISG7JxBFAf2Chu3/k7uuBCcCgqhu4+2vu\nviL1cAbQIQP7rdWBB0K7dvDww9nYm4hIbslEAOwKfFrl8eLUczU5C3gmA/utVZMmagOJiNQkq4PA\nZvZDQgBctoVthppZqZmVlpWVpb3P4mK1gUREqpOJAFgCdKzyuEPque8ws32AO4FB7v5FTS/m7re7\ne5G7F7VpU6cL229RZRtIZwOJiHxXJgLgTaCrmXU2s+bAYOCJqhuY2W7Ao8Cp7v5BBvZZZ02ahKOA\nyZPh66+zuWcRkXhLOwDcfQNwHvAcMA94yN3fNbNhZjYstdlVwI7ArWY2y8xK091vfRQXw7p1agOJ\niFRl7h51DTUqKiry0tL0s6KiAjp2hP794bHHMlCYiEhMmdlMdy+qy7Z5OxO4qsqzgZ55Rm0gEZFK\niQgACJPC1q2DJ5+MuhIRkXhITADsvz/suqvOBhIRqZSYAKhsAz37LKxaFXU1IiLRS0wAgNpAIiJV\nJSoABg4MbSCtDSQikrAAqJwU9swzagOJiCQqACC0gdavhyeeqH1bEZF8lrgAGDAAOnRQG0hEJHEB\nUNkGevZZ+OqrqKsREYlO4gIAQgCsX6+zgUQk2RIZAAMGhLWBNClMRJIskQFQOSnsuefUBhKR5Epk\nAIDOBhIRSWwADBgAu+2mNpCIJFdiA8AstIGmTIGVK6OuRkQk+xIbAKA2kIgkW6IDoH//0AbSpDAR\nSaJEB4BZmBPw3HNqA4lI8iQ6ACC0gcrLYdKkqCsREcmuxAfAfvvB7rurDSQiyZP4AKhsA+lsIBFJ\nmsQHAIQAUBtIRJJGAcCmNpAmhYlIkigACG2gkhKYOhVWrIi6GhGR7FAApKgNJCJJowBIKSqCTp3U\nBhKR5FAApFSeDaQ2kIgkhQKgipIS2LABHn886kpERBqfAqCKffeFzp01KUxEkkEBUEXVNtCXX0Zd\njYhI41IAbEZtIBFJCgXAZvr1UxtIRJJBAbCZyklhzz+vNpCI5DcFQDXUBhKRJMhIAJjZkWY238wW\nmtmIav59LzN73czWmdklmdhnY+rbF/bYQ5PCRCS/pR0AZlYAjAGOAgqBk8yscLPNvgSGAzeku79s\nqNoG+uKLqKsREWkcmTgC6A8sdPeP3H09MAEYVHUDd1/u7m8C5RnYX1YUF8PGjWoDiUj+ykQA7Ap8\nWuXx4tRzOa1vX+jSRW0gEclfsRsENrOhZlZqZqVlZWUR1hHaQNOmqQ0kIvkpEwGwBOhY5XGH1HMN\n4u63u3uRuxe1adMm7eLSUdkGeuyxSMsQEWkUmQiAN4GuZtbZzJoDg4EnMvC6kevTB/bcU20gEclP\naQeAu28AzgOeA+YBD7n7u2Y2zMyGAZjZLma2GLgIuNLMFpvZtunuu7FVrg30wgvw+edRVyMiklkZ\nGQNw98nu3s3du7j7tannxrr72NT9Ze7ewd23dfftU/dXZWLfja2kRG0gEclPsRsEjpvevaFrV60N\nJCL5RwFQC7WBRCRfKQDqQG0gEclHCoA62Gef0AbS2UAikk8UAHVQOSnshRcgwrlpIiIZpQCoo5IS\nqKhQG0hE8ocCoI569YJu3dQGEpH8oQCoo8o20Isvqg0kIvlBAVAPxcWhDfToo1FXIiKSPgVAPfTq\nBd27qw0kIvlBAVAPlZPCpk+H5cujrkZEJD0KgHqqPBtIbSARyXUKgHrq2RP22ktrA4lI7lMA1FPV\nNtBnn0VdjYhIwykAGkBtIBHJBwqABth7b7WBRCT3KQAaoHJS2N//rjaQiOQuBUADqQ0kIrlOAdBA\ne+8NPXpoUpiI5C4FQBpKSuCll2DZsqgrERGpPwVAGrQ2kIjkMgVAGvbeGwoL1QYSkdykAEiT2kAi\nkqsUAGkqLgZ3mDgx6kpEROpHAZCmwsLQCtKkMBHJNQqADCguDm2gpUujrkREpO4UABmgNpCI5CIF\nQAYUFoZlotUGEpFcogDIkOJiePlltYFEJHcoADJEbSARyTUKgAzp0SO0gTQpTERyhQIgg0pK4JVX\n4F//iroSEZHaKQAySG0gEcklCoAM2msv6NVLbSARyQ0KgAwrKYFXX4UlS6KuRERkyzISAGZ2pJnN\nN7OFZjaimn83MxuV+vc5ZtYvE/uNI7WBRCRXpB0AZlYAjAGOAgqBk8yscLPNjgK6pm5DgdvS3W9c\nde8O++yjSWEiEn+ZOALoDyx094/cfT0wARi02TaDgHs9mAFsb2btMrDvWKo8G0htIBGJs0wEwK7A\np1UeL049V99t8kZxcfj4yCPR1iEisiWxGwQ2s6FmVmpmpWVlZVGX0yDdukHv3moDiUi8ZSIAlgAd\nqzzukHquvtsA4O63u3uRuxe1adMmA+VFo7g4nA20eHHUlYiIVC8TAfAm0NXMOptZc2Aw8MRm2zwB\nnJY6G2gg8JW75/WyaZVtIJ0NJCJxlXYAuPsG4DzgOWAe8JC7v2tmw8xsWGqzycBHwELgDuCcdPcb\nd926QZ8+mhQmIvHVNBMv4u6TCW/yVZ8bW+W+A+dmYl+5pLgYrrgCPv0UOnasfXsRkWyK3SBwPlEb\nSETiTAHQiLp2VRtIROJLAdDISkrg9ddDG0hEJE4UAI1Mk8JEJK4UAI1szz2hb19NChOR+FEAZEFl\nG+iTT6KuRERkEwVAFqgNJCJxpADIgi5doF8/tYFEJF4UAFlSXAwzZqgNJCLxkZGZwFK74mK4/HI4\n/njYYw9o1Qpatvzux4beb9Ys6v+diOQiBUCWdOkC55wDr70Gc+bAmjWbbmvXQnl5w1+7oCD9IBk0\nKASTiCSHAiCLxoyp+d82btwUBlWDIRP3v/ii+ufXrNm0/xtvhLfegp13bvyvg4jEgwIgJgoKYJtt\nwi1b3GH9epg1Cw49FAYPhilToKl+KkQSQYPACWYGLVrAgAEwdiy8+CL89rdRVyUi2aIAEACGDIGz\nz4Y//1mrl4okhQJA/u1//zccDZx+Orz/ftTViEhjUwDIv7VoEWYrt2oFP/sZfP111BWJSGNSAMh3\ndOgAEybA/Plw5plhoFhE8pMCQL7nsMNg5MhwNHDTTVFXIyKNRQEg1brkktAGuuwymD496mpEpDEo\nAKRaZnD33eF6BieeCEuWRF2RiGSaAkBqtO228OijsHo1nHBCmDQmIvlDASBbVFgYjgRmzICLLoq6\nGhHJJAWA1Kq4GC6+OKxlNH581NWISKYoAKRORo6EQw6BX/0KZs+OuhoRyQQFgNRJ06bwf/8HO+wQ\nzg5asSLqikQkXQoAqbO2bcPcgE8/hVNPhYqKqCsSkXQoAKRe9t8/rBn09NNw7bVRVyMi6VAASL2d\ncw6ccgr8/vfw7LNRVyMiDaUAkHozg3HjoFcv+PnPYdGiqCsSkYZQAEiDbLVVmCRWUREudF/18pIi\nkhsUANJgXbrAfffB22+HtpBWDhXJLQoAScuxx8Lvfgf33AN33BF1NSJSHwoASdvvfw9HHgm//jW8\n8UbU1Ujc/Pd/w1FHwauvRl2JbE4BIGkrKID774f27cOicWVlUVckcfHqq3D11TBtGhx0EBxzTGgZ\nSjykFQBm1trMpprZgtTHHWrY7q9mttzM5qazP4mv1q3DxeSXL4fBg2HDhqgrkqitXw9Dh8Juu8Hi\nxWE5kddfh379wvpS8+ZFXaGkewQwApjm7l2BaanH1bkHODLNfUnM9esHt90GL7wQxgUk2a6/Ht57\nL/xM7LxzuLjQokVw1VVh/kjPnjBkCHz0UdSVJle6ATAI+Fvq/t+A46rbyN1fAr5Mc1+SA844IywY\nN3IkPPZY1NVIVObPhz/8IVxM6OijNz2/3XahJbRoUVhe/KGHoHt3OPtsXXQoCukGQFt3X5q6vwxo\nm+brYWZDzazUzErL1EzOSTffDPvtF/66mz8/6mok2yoqQutnq63Cz0J1dtoJ/vxn+PBD+OUv4a67\nwtXnLrkEPv88u/UmWa0BYGbPm9ncam6Dqm7n7g6kfSa4u9/u7kXuXtSmTZt0X04i0KJFGA9o0SKs\nHPrNN1FXJNl0993w0ktwww1hAcEtad8ebr01/KFw4olhnanOnUOb6KuvslNvktUaAO5+hLv3rOY2\nCfjMzNoBpD4ub+yCJTd07AgTJsD778MvfpE/k8TcYfp0eOedqCuJp2XLwl/xhxwCZ55Z98/r3DnM\nJZk7N5wyes014bmRI8MlSaVxpNsCegIYkro/BJiU5utJHjn8cLjuunAdgZpaAbnCHZ56CgYOhB/+\nMNzUs/6+Cy4Iy4KMGxfWjKqvHj3CuMBbb8EBB8Dll4cZ56NGwbp1ma838dy9wTdgR8LZPwuA54HW\nqefbA5OrbPcgsBQoBxYDZ9Xl9ffdd1+X3FZR4f7Tn7oXFLj//e9RV1N/Gze6T5zo3revO7h36uR+\n/fXuW23lfthh7hs2RF1hfDz1VPgaXXNN5l7z1VfdDz00vO5uu7nfead7eXnmXj8fAaVe1/fwum4Y\nxU0BkB+++sq9Wzf3tm3dlyyJupq62bDBfcIE9549w2/Jnnu63323+/r14d/vuis8/8c/RlpmbHz9\ndXiDLix0X7cus69dUeE+dap7//7ha961q/sDD4Rwlu+rTwBoJrA0um23DSuHfvNNmAC0fn3UFdVs\nw4awwF3PnmFC28aN4fG8eXD66dCsWdjujDOgpCTMd/jHPyItORauugo++SSsB9W8eWZf2wyOOAJm\nzIBJk6Bly7AMeZ8+4XG+jC9FQQEgWbH33uFUv9deg9/8Jupqvq+8PJy90qNHuNxls2Zh7OKdd+Dk\nk8M1kauqvCbCrrvCSSfBqlXR1B0Hb74ZxnjOPjv07RuLGfzkJzBrFjz4IKxdC8cdF8Zlpk5VEDRI\nXQ8VoripBZR/LrwwHMbff3/UlQRr17qPHRt6++Der5/7Y4/Vvb3wyivuTZq4n3xy49YZV+vXu/fu\n7d6+vfvKldndd3l5GBPo2DF87w49NIwZJB0aA5C4Wr/e/eCD3Vu1cp89O7o61qxxHz3avUOH8Fsw\nYEAYxKyoqP9r/c//hNe4997M1xl3f/pT+L8/+mh0Naxd6z5qVBhjAvejj3Z/663o6omaAkBibelS\n93btwsDqihXZ3ffq1e433eS+yy7hp/+gg9ynTGnYG3+lDRtCqG2zjfuCBZmrNe4+/DAE+XHHRV1J\n8M037iNHuu+wQ/jennCC+3vvRV1V9tUnADQGIFm3yy7w8MPw8cdw2mlh6YDG9vXX8Kc/QadOYQ2a\nHj3gxRfDjNUf/ahh56xXKigIA8XNmoXxgDgPcmeKOwwbFsZGRo+Ouppg661rXnBO162uQV2TIoqb\njgDy26hR4S+1P/yh8faxcmU4L71167Cv//zP0LdvDBMnhn1cemnjvH6cjB8f/q+33BJ1JTVbvtz9\n4ovdW7Z0b9bM/eyzc+c05HRQjyMAC9vHU1FRkZeWlkZdhjQS93DGzQMPhL/WfvzjzL32l1+GM1Nu\nvjmsKVN56cr+/TO3j+oMGxbODpo6NZy6mI8+/zwcQXXtCq+8Ak1i3kdYsgSuvTacotq0KZx7LowY\nERaka0zu4Uylb7+t+bZmTfXPt2gRlsNoCDOb6e5FddpWASBRWr0a9t8//JLOnBlaNOkoKwsLit1y\nS2j7/PSncOWV4VoF2fDtt2El1C+/hDlzIB/XMzz99HAFuLffDi2WXLFoUViKevz4sFLphReG/8uG\nDVt+M67LG3ZN2zVEy5ZhLa0PPmjY5ysAJKcsXAhFRWE54FdeCb8A9bVsWVh98rbbwi9fSQlccQX0\n6pX5emszZ0440jjiCHjyyfTGF+Jm2rTw/7riirDefy6aNy+METzySP0+r0WLEBxVb61aff+5hmxT\nuV2rVukfUSkAJOc8+WSY5HPWWXDnnXX/vCVLwuDu7beHwdef/xx++9vQoojS6NEwfHhoQQ0fHm0t\nmbJmTQjUJk1CyDUkqONk9mx44426vWm3ahUG+3NBfQIg8oHeLd00CJwsV14ZBhbvuKP2bT/+OAzq\nNW/u3rSp+xlnuH/wQePXWFcVFe7HHhvqmzUr6moyY8SI8P154YWoK5EtQYPAkos2bgyXD5w+PbSC\n9tvv+9t8+CH88Y/wt7+F1soZZ4QBvc6ds15urcrKoHfvcBnEmTPDX5K5as6cMI4yZEhY0kPiqz5H\nADEfv5ckKSgIZwS1awcnnPDdSwPOnx/efLp3D+fcDxsWwmDcuHi++UMYAB4/PtR+4YVRV9NwGzeG\nyza2bh0u4yj5QwEgsbLjjuFykp99FiZVvfNO6OsXFobJY8OHh7M5Ro8OZ0rE3eGHw6WXhjGK+g46\nxsWYMaFXfvPNIQQkf6gFJLH017+GAWEIMzzPOy/M4N1552jraojycjjwQFiwIAw87rZb1BXV3Sef\nhPD9wQ9g8uT8OqMpX9WnBdS09k1Esu/MM2H58nDmyfDh4cggVzVrFpYv7tsXTjklLEGRC2eUuIdJ\nU+7h9Fq9+ecftYAktkaMCBN3cvnNv1KXLnDrrfDyy2FWai6YODFcB/maa9KfoCfxpAAQyZJTTgm3\nq68OZznF2cqV8OtfhzN/8mUeg3yfAkAki8aMCX9Nn3wyrFgRdTU1u+yy0IKrXD9H8pMCQCSLtt02\njAf861/wq1/F8zKGL78czlq68MLsraEk0VAAiGRZ//5hHZ2HHw5nO8XJunUwdGg4Srn66qirkcam\nABCJwG9+E+YIDB8O778fdTWbjBwZ6rnttnD6reQ3BYBIBJo0gXvvDYuMnXRS+Ms7avPmwXXXhYl3\nRx4ZdTWSDQoAkYi0bw933w2zZoVTXqNUURFaP9tsE66nIMmgABCJ0H/9V5jl/Je/wDPPRFfHnXeG\nU1NvvDE3Z1tLw2gpCJGIrV0bBoaXLQurbu6yS3b3v3RpuH5Cv37hgi+a8ZvbtBqoSA5p2RImTIBv\nvoHTTgvtmGw6//wQQuPG6c0/aRQAIjFQWBh671Onwk03ZW+/Tz4ZTke96qpwkXdJFrWARGLCPVwH\n4ckn4fXXYd99G3d/X38dgmf77cMFa5o3b9z9SXaoBSSSg8zC0gtt28LgweENujFdeWW4pvIdd+jN\nP6kUACIx0ro13H8/fPRRWIytsbzxRriozrnnwsCBjbcfiTcFgEjMHHwwXHFFuO7xgw9m/vXLy8Ml\nHtu3z52lqaVxKABEYuiqq+CAA8K1jxctyuxr33hjON10zJiwOJ0klwJAJIaaNg2tILOwNEN5eWZe\nd+HCsMjb8cfDoEGZeU3JXWkFgJm1NrOpZrYg9XGHarbpaGYvmtl7ZvaumZ2fzj5FkqJTp3Bu/owZ\nmVmZ0z0cUTRvDqNGpf96kvvSPQIYAUxz967AtNTjzW0ALnb3QmAgcK6ZFaa5X5FEOPHEcH3k664L\n1xJOx/jxYabv9deH/r9IWvMAzGw+cKi7LzWzdsB0d+9ey+dMAm5x96m1vb7mAYjA6tVhmYbVq2H2\n7IZdI7msLCz30L17uOBLEzV/81Y25wG0dfelqfvLgLa1FNYJ6Av8YwvbDDWzUjMrLSsrS7M8kdy3\n9dbhbKDly+Gssxp2FbGLLoJVq8I5/3rzl0q1/iiY2fNmNrea23eGkDwcStT4o2lm2wATgQvcfVVN\n27n77e5e5O5Fbdq0qcd/RSR/9esXLtYyaRKMHVu/z50yBe67Dy6/PMz8FamUlRaQmTUDngKec/c6\nr3SiFpDIJhUVcMwxMH06vPkm9OxZ++d8+23YrnnzcN2Bli0bvUyJWDZbQE8AQ1L3hwCTqinGgLuA\nefV58xeR72rSBO65B7bbLiwVsWZN7Z9z9dVhHsG4cXrzl+9LNwBGAj8yswXAEanHmFl7M5uc2uZA\n4FTgMDOblbodneZ+RRKpbdswQ/jdd+GSS7a87dtvh0lfv/gFHHJIduqT3KLVQEVy0MUXh2WjH3+8\n+gldGzfCgAGweHG41u8O35uhI/lKq4GK5LnrrgsDw2eeGd7kNzd6dFjiedQovflLzRQAIjmoRYtw\naui6dXDqqeEv/kr//GdY6vmYY6C4OLoaJf4UACI5qlu38Jf+9Olhdi+EOQLnnhvujxmjSzzKljWN\nugARabjTT4fnngurhx52WPjr/+mnw+Uld9896uok7jQILJLjVq6EPn3CaaLffgsdO4YF5AoKoq5M\nolCfQWAdAYjkuO23hwceCBeSAXj2Wb35S90oAETywAEHhOsHlJeHowGRulAAiOSJE0+MugLJNToL\nSEQkoRQAIiIJpQAQEUkoBYCISEIpAEREEkoBICKSUAoAEZGEUgCIiCRUrNcCMrMy4J8N/PSdgM8z\nWE6mqK76UV31o7rqJx/r2t3d29Rlw1gHQDrMrLSuCyJlk+qqH9VVP6qrfpJel1pAIiIJpQAQEUmo\nfA6A26MuoAaqq35UV/2orvpJdF15OwYgIiJbls9HACIisgV5EQBm9lczW25mc6s819rMpprZgtTH\nHbJcU0cze9HM3jOzd83s/JjU1dLM3jCz2am6ro5DXVXqKzCzt83sqbjUZWYfm9k7ZjbLzEpjVNf2\nZvaImb1vZvPMbP+Y1NU99bWqvK0yswuirs3MLkz9zM81swdTvwtx+Hqdn6rpXTO7IPVcVurKiwAA\n7gGO3Oy5EcA0d+8KTEs9zqYNwMXuXggMBM41s8IY1LUOOMzdewN9gCPNbGAM6qp0PjCvyuO41PVD\nd+9T5dS8ONR1M/Csu+8F9CZ83SKvy93np75WfYB9gW+Bx6Kszcx2BYYDRe7eEygABkdZU6qunsAv\ngf6E7+GxZrZn1upy97y4AZ2AuVUezwfape63A+ZHXN8k4EdxqgvYCngLGBCHuoAOqR/2w4Cn4vJ9\nBD4GdtrsuUjrArYDFpEax4tLXdXU+WPg1ahrA3YFPgVaE66E+FSqtqi/j8XAXVUe/w64NFt15csR\nQHXauvvS1P1lQNuoCjGzTkBf4B/EoK5Um2UWsByY6u6xqAv4C+GHv6LKc3Goy4HnzWymmQ2NSV2d\ngTLg7lTL7E4z2zoGdW1uMPBg6n5ktbn7EuAG4BNgKfCVu0+JsqaUucAPzGxHM9sKOBromK268jkA\n/s1DjEZyupOZbQNMBC5w91VxqMvdN3o4PO8A9E8dhkZal5kdCyx395k1bRPh9/Gg1NfrKEIr7+AY\n1NUU6Afc5u59gdVs1iaI8ucewMyaAz8BHt7837JdW6qHPogQnO2Brc3slChrSu1zHnA9MAV4FpgF\nbMxWXfkcAJ+ZWTuA1Mfl2S7AzJoR3vzvd/dH41JXJXdfCbxIGD+Juq4DgZ+Y2cfABOAwM7svBnVV\n/vWIuy8n9LL7x6CuxcDi1NEbwCOEQIi6rqqOAt5y989Sj6Os7QhgkbuXuXs58ChwQMQ1AeDud7n7\nvu5+MLAC+CBbdeVzADwBDEndH0LowWeNmRlwFzDP3W+KUV1tzGz71P1WhHGJ96Ouy90vd/cO7t6J\n0DZ4wd1PibouM9vazP6j8j6hbzw36rrcfRnwqZl1Tz11OPBe1HVt5iQ2tX8g2to+AQaa2Vap383D\nCYPmkX+9zGzn1MfdgJ8BD2StrmwOeDTiQMqDhL5eOeEvo7OAHQkDiguA54HWWa7pIMJh2xzCYd0s\nQn8v6rr2Ad5O1TUXuCr1fKR1bVbjoWwaBI7667UHMDt1exe4Ig51pWroA5SmvpePAzvEoa5UbVsD\nXwDbVXku6u/l1YQ/duYC44EWUdeUqutlQnjPBg7P5tdKM4FFRBIqn1tAIiKyBQoAEZGEUgCIiCSU\nAkBEJKEUACIiCaUAEBFJKAWAiEhCKQBERBLq/wGBSZyDUoqtYQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x115009090>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "CH_scores_2 = []\n",
    "\n",
    "for K in Ks:\n",
    "    ch = K_cluster_analysis(K, X_train_2)\n",
    "    CH_scores_2.append(ch)\n",
    "    \n",
    "\n",
    "plt.plot(Ks, np.array(CH_scores_2), 'b-')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### 数据还不如原始数据，不理想"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PCA 降维处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(8846, 45)"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 试着进行 PCA 降维度\n",
    "from sklearn.decomposition import PCA\n",
    "pca = PCA(n_components=0.75)\n",
    "pca.fit(X_train_2)\n",
    "#\n",
    "X_train_pca = pca.transform(X_train)\n",
    "\n",
    "X_train_pca.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 降维后特征 45 列，还可以"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 10\n",
      "CH_score: 0.407747575148, time elaps:3\n",
      "K-means begin with clusters: 20\n",
      "CH_score: 0.225868108903, time elaps:2\n",
      "K-means begin with clusters: 30\n",
      "CH_score: 0.194917338218, time elaps:2\n",
      "K-means begin with clusters: 40\n",
      "CH_score: 0.111107637904, time elaps:2\n",
      "K-means begin with clusters: 50\n",
      "CH_score: 0.12658481054, time elaps:3\n",
      "K-means begin with clusters: 60\n",
      "CH_score: 0.0713703417849, time elaps:3\n",
      "K-means begin with clusters: 70\n",
      "CH_score: 0.0925752324437, time elaps:3\n",
      "K-means begin with clusters: 80\n",
      "CH_score: 0.0968045978296, time elaps:3\n",
      "K-means begin with clusters: 90\n",
      "CH_score: 0.0770482082383, time elaps:3\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x113abf290>]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD8CAYAAABw1c+bAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuYFNWZx/HvyyCiIHKbaATlooDBrCBOEAnBC5EFlptx\n1CGta1YNi9GoMZtodM0TL8kmRhOz3naJaLIxgjCCjqh4QWPES3BQVBRQFBQxgUFRgyjXd/84PaEd\nB6Z7pruruvv3eZ55pqu6qvudGfhV9alT55i7IyIipaNV1AWIiEh+KfhFREqMgl9EpMQo+EVESoyC\nX0SkxCj4RURKjIJfRKTEKPhFREqMgl9EpMS0jrqAxnTt2tV79uwZdRkiIgVj0aJF6929PJ1tYxn8\nPXv2pLa2NuoyREQKhpm9le62auoRESkxCn4RkRKj4BcRKTEKfhGREqPgFxEpMQp+EZESo+AXESkx\nRRP8n3wC110H8+dHXYmISLwVTfC3aQPXXgs33hh1JSIi8ZZW8JvZKDNbbmYrzOyS3Wz3FTPbZmaV\nme7bUmVlUFUFDzwAGzbk6l1ERApfk8FvZmXATcBooD8wycz672K7XwAPZ7pvtiQSsGULVFfn6h1E\nRApfOmf8g4EV7v6mu28BZgATGtnuu8DdwLpm7JsVRx4JffvCH/+Yq3cQESl86QR/N2B1yvI7yXX/\nYGbdgBOBWzLdN5vMwln/E0/A6tVNby8iUoqydXH3euBid9/R3Bcws8lmVmtmtXV1dc0u5JvfDN+n\nT2/2S4iIFLV0gn8NcGDKcvfkulQVwAwzWwVUAjeb2cQ09wXA3ae6e4W7V5SXpzWkdKMOOQSGDFFz\nj4jIrqQT/M8Bfcysl5m1AaqAmtQN3L2Xu/d0955ANfAdd78nnX1zIZGAl16Cl1/O9TuJiBSeJoPf\n3bcB5wEPAUuBme7+iplNMbMpzdm35WXv3imnhO6dOusXEfk8c/eoa/iciooKb+kMXGPGwJIlsGoV\ntCqa29RERBpnZovcvSKdbYs2EhOJ0LNnwYKoKxERiZeiDf4JE2DvvdXcIyLSUNEGf/v2MHEizJoV\n7uYVEZGgaIMfQnPPhg3w4INRVyIiEh9FHfwnnADl5WruERFJVdTBv8cecOqpcN998OGHUVcjIhIP\nRR38EJp7Pv0UZs+OuhIRkXgo+uA/6ig4+GA194iI1Cv64DcLA7c99hi8+27U1YiIRK/ogx9Cc487\nzJgRdSUiItErieDv1y9M0qLmHhGREgl+CGf9zz8Py5ZFXYmISLRKJvirqsJgbTrrF5FSVzLB/8Uv\nwogRIfhjOCCpiEjelEzwQ2juWbkSnnkm6kpERKJTUsF/4onQtq2ae0SktJVU8HfoAOPHw8yZsHVr\n1NWIiESjpIIfQnPP+vXw8MNRVyIiEo2SC/5Ro6BzZzX3iEjpKrngb9MGTj4Z7r0XNm6MuhoRkfwr\nueAHOO002LQJ7rkn6kpERPIvreA3s1FmttzMVpjZJY08P8HMXjKzxWZWa2bDUp5bZWYv1z+XzeKb\na+hQ6NFDzT0iUpqaDH4zKwNuAkYD/YFJZta/wWbzgQHuPhA4E7i1wfPHuftAd6/IQs0t1qpVGLHz\n4Ydh7dqoqxERya90zvgHAyvc/U133wLMACakbuDuG93/cT9sOyD298YmErBjB9x1V9SViIjkVzrB\n3w1YnbL8TnLdZ5jZiWa2DLifcNZfz4FHzWyRmU3e1ZuY2eRkM1FtXV1detW3wGGHwYABau4RkdKT\ntYu77j7H3Q8FJgJXpTw1LNkENBo418yG72L/qe5e4e4V5eXl2SprtxIJWLgQXn89L28nIhIL6QT/\nGuDAlOXuyXWNcvc/A73NrGtyeU3y+zpgDqHpKBYmTQozdN15Z9SViIjkTzrB/xzQx8x6mVkboAqo\nSd3AzA4xM0s+HgTsCbxnZu3MbJ/k+nbASGBJNn+AlujeHY45RiN2ikhpaTL43X0bcB7wELAUmOnu\nr5jZFDObktzsJGCJmS0m9AA6NXmxdz9ggZm9CCwE7nf3ebn4QZrrtNNCU09tLDqaiojknnkMT3Ur\nKiq8Nk9J/MEHsN9+cM45cP31eXlLEZGsM7NF6XaZL8k7d1N17Ahjx4aJ2Ldti7oaEZHcK/ngh9C7\nZ+1amD8/6kpERHJPwQ+MGQP77qs+/SJSGhT8hFm5KithzpwweJuISDFT8CclEmGY5pqaprcVESlk\nCv6kY46Bbt3U3CMixU/Bn1Q/Yue8eWFqRhGRYqXgT5FIhC6ds2ZFXYmISO4o+FMcfngYtVPNPSJS\nzBT8KczCWf9TT8HKlVFXIyKSGwr+Br75zfBdI3aKSLFS8DfQowcMG6YRO0WkeCn4G5FIwNKlsHhx\n1JWIiGSfgr8RJ58MrVvrIq+IFCcFfyO6dAnj90yfDtu3R12NiEh2Kfh3IZGAd9+FJ56IuhIRkexS\n8O/CuHGwzz5q7hGR4qPg34W99oJvfAOqq+HTT6OuRkQkexT8u5FIwEcfwdy5UVciIpI9Cv7dOP54\n2H9/NfeISHFJK/jNbJSZLTezFWZ2SSPPTzCzl8xssZnVmtmwdPeNs7IyqKqCBx6ADRuirkZEJDua\nDH4zKwNuAkYD/YFJZta/wWbzgQHuPhA4E7g1g31jLZGALVtCW7+ISDFI54x/MLDC3d909y3ADGBC\n6gbuvtH9HwMctAM83X3j7sgjoV8/NfeISPFIJ/i7AatTlt9JrvsMMzvRzJYB9xPO+tPeN87qR+x8\n4glYvbrp7UVE4i5rF3fdfY67HwpMBK7KdH8zm5y8PlBbV1eXrbKyon7EzunTo61DRCQb0gn+NcCB\nKcvdk+sa5e5/BnqbWddM9nX3qe5e4e4V5eXlaZSVPwcfDEOGwB13RF2JiEjLpRP8zwF9zKyXmbUB\nqoCa1A3M7BAzs+TjQcCewHvp7FsoEgl4+eXwJSJSyJoMfnffBpwHPAQsBWa6+ytmNsXMpiQ3OwlY\nYmaLCb14TvWg0X1z8YPk2imnhO6dusgrIoXOPIazjVRUVHhtbW3UZXzOmDGwZAmsWgWtdOubiMSI\nmS1y94p0tlV8ZSCRCD17FiyIuhIRkeZT8Gdg4kRo107NPSJS2BT8GWjXLoT/rFnhbl4RkUKk4M9Q\nIhHG7XnwwagrERFpHgV/hk44AcrL1dwjIoVLwZ+h1q3h1FOhpgY+/DDqakREMqfgb4ZEAjZvhtmz\no65ERCRzCv5mOOqoMIyDmntEpBAp+JvBLAzc9thj8O67UVcjIpIZBX8zJRLgDjNmRF2JiEhmFPzN\n1K8fVFSouUdECo+CvwUSCXj+eVi2LOpKRETSp+BvgaqqMFibzvpFpJAo+Ftg//1hxIgQ/DEc5FRE\npFEK/hZKJGDlSnjmmagrERFJj4K/hU48Edq2VXOPiBQOBX8LdegA48fDzJmwdWvU1YiINE3BnwWJ\nBKxfDw8/HHUlIiJNU/BnwahR0LmzmntEpDAo+LOgTZswGfu998LGjVFXIyKye2kFv5mNMrPlZrbC\nzC5p5PmEmb1kZi+b2dNmNiDluVXJ9YvNLH4zqGdJIgGbNsE990RdiYjI7jUZ/GZWBtwEjAb6A5PM\nrH+DzVYCx7j7PwFXAVMbPH+cuw9Mdwb4QjR0KPToAXfcEXUlIiK7l84Z/2Bghbu/6e5bgBnAhNQN\n3P1pd9+QXHwW6J7dMuOvVaswYucjj8DatVFXIyKya+kEfzdgdcryO8l1u3IWkDojrQOPmtkiM5uc\neYmFI5GAHTvgrruirkREZNeyenHXzI4jBP/FKauHuftAQlPRuWY2fBf7TjazWjOrraury2ZZeXPY\nYTBggHr3iEi8pRP8a4ADU5a7J9d9hpkdDtwKTHD39+rXu/ua5Pd1wBxC09HnuPtUd69w94ry8vL0\nf4KYSSRg4UJ4/fWoKxERaVw6wf8c0MfMeplZG6AKqEndwMwOAmYDp7v7aynr25nZPvWPgZHAkmwV\nH0eTJoUZuu68M+pKREQa12Twu/s24DzgIWApMNPdXzGzKWY2JbnZj4EuwM0Num3uBywwsxeBhcD9\n7j4v6z9FjHTvDsceqxE7RSS+zGOYThUVFV5bW7hd/qdNg7PPDk0+X/lK1NWISCkws0XpdpnXnbs5\ncNJJ4W5eXeQVkThS8OdAx44wdixMnw7btkVdjYjIZyn4cySRgHXrYP78qCsREfksBX+OjBkD++6r\n5h4RiR8Ff460bQuVlTBnThi8TUQkLhT8OXTaaWGY5pqaprcVEckXBX8ODR8e+vWruUdE4kTBn0Ot\nWoU7eefNC1MziojEgYI/xxKJ0KXzF7+AzZujrkZERMGfc4cfDuPGwbXXwsEHw29+Ax9/HHVVIlLK\nFPw5Zhbm4n34YTjkELjwQujZE376U/jgg6irE5FSpODPAzM44QT4059gwQIYPBj+8z/DVI2XXhpu\n9BIRyRcFf5599atw//3wwgvwz/8MP/95+ARwwQWwenWTu4uItJiCPyIDB8LMmbB0KVRVwc03h2sA\nZ5+tSVxEJLcU/BHr1w9uuw1WrIB///fQ5//QQ8PB4KWXoq5ORIqRgj8mevSAG26AVavgBz+ABx4I\n8/eOGwfPPBN1dSJSTBT8MbPffqHd/6234Mor4emnYehQOP54ePRRzeolIi2n4I+pTp3g8svDAeC6\n62DZstAzaMiQ0D10x46oKxSRQqXgj7n27eGii2DlSvjf/w1DP0ycGJqB7rxTE72ISOYU/AVizz1h\n8mRYvhzuuCOc8ScS4ULwb3+r4SBEJH0K/gLTunUI/JdfDmP9d+oUDgi9e8Ovf63hIESkaWkFv5mN\nMrPlZrbCzC5p5PmEmb1kZi+b2dNmNiDdfaV5WrUKTT4LF4bhIPr2DU1CPXrA1VdrOAgR2bUmg9/M\nyoCbgNFAf2CSmfVvsNlK4Bh3/yfgKmBqBvtKC9QPB/H44/DUU+Hi7+WXw0EHwY9+BGvXRl2hiMRN\nOmf8g4EV7v6mu28BZgATUjdw96fdfUNy8Vmge7r7SvYMHQpz54bhIEaPDkNB9+wJ558Pb78ddXUi\nEhfpBH83IHUUmXeS63blLODBTPc1s8lmVmtmtXV1dWmUJbsycCDcdVfoAjppEtxySxgO4swz4bXX\noq5ORKKW1Yu7ZnYcIfgvznRfd5/q7hXuXlFeXp7NskpW375hOIg33oBzzoHp00MvoFNPhRdfjLo6\nEYlKOsG/BjgwZbl7ct1nmNnhwK3ABHd/L5N9JbcOOgj++7/DcBAXXwwPPhg+FYwdC6++GnV1IpJv\n6QT/c0AfM+tlZm2AKqAmdQMzOwiYDZzu7q9lsq/kz377wX/9V7gb+KqrwhhAI0dqPgCRUtNk8Lv7\nNuA84CFgKTDT3V8xsylmNiW52Y+BLsDNZrbYzGp3t28Ofg7JQKdOYSKY+fPhvffCdYDt26OuSkTy\nxTyGo35VVFR4bW1t1GWUhNtvDxd9L700TAcpIoXJzBa5e0U62+rO3RL3b/8WJn/52c9CV1ARKX4K\nfuGGG2DQIDj9dHjzzairEZFcU/ALbdtCdXV4XFkJn3wSbT0iklsKfgGgVy/4wx/CXb/f/W7U1YhI\nLin45R/GjoXLLoNp08KNXyJSnBT88hlXXAEjRsC554azfxEpPgp++YyysjC0Q5cuob1/w4am9xGR\nwqLgl88pL4dZs8KInmecofl9RYqNgl8adfTR8KtfwX33wTXXRF2NiGSTgl926bzzoKoqXPB97LGo\nqxGRbFHwyy6ZhYnc+/ULB4A1GldVpCgo+GW32reHu++GTZvglFNg69aoKxKRllLwS5O+9KXQt//p\np+GHP4y6GhFpKQW/pOXUU+GCC+D662HmzKirEZGWUPBL2q65JvT2OessWLo06mpEpLkU/JK2Nm3C\n2f5ee8FJJ8HGjVFXJCLNoeCXjHTvHu7sXb4cvv1tiOE8PiLSBAW/ZGzECLj6apgxA266KepqRCRT\nCn5plosvhnHj4KKLwqTtIlI4FPzSLK1awe9/H5p+Tj4Z6uqirkhE0pVW8JvZKDNbbmYrzOySRp4/\n1MyeMbPNZvYfDZ5bZWYvm9liM9MM6kWkU6dwc9f69TBpEmzfHnVFIpKOJoPfzMqAm4DRQH9gkpn1\nb7DZ+8D5wLW7eJnj3H1gujPAS+E44gi4+WaYPx9+8pOoqxGRdKRzxj8YWOHub7r7FmAGMCF1A3df\n5+7PAbqhvwSdeWbo23/11XD//VFXIyJNSSf4uwGrU5bfSa5LlwOPmtkiM5ucSXFSOG64AQYOhNNO\ng5Uro65GRHYnHxd3h7n7QEJT0blmNryxjcxsspnVmlltna4UFpy99grt/RBm7vr002jrEZFdSyf4\n1wAHpix3T65Li7uvSX5fB8whNB01tt1Ud69w94ry8vJ0X15ipHdv+L//g+efh/PPj7oaEdmVdIL/\nOaCPmfUyszZAFVCTzoubWTsz26f+MTASWNLcYiX+xo2DSy8N4/jffnvU1YhIY1o3tYG7bzOz84CH\ngDLgNnd/xcymJJ//HzPbH6gFOgA7zOxCQg+grsAcM6t/rzvdfV5ufhSJiyuvhGefhe98J/T6GTgw\n6opEJJV5DAdbqaio8NpadfkvZOvWhdBv2xYWLYKOHaOuSKS4mdmidLvM685dyYkvfAFmzYK334Yz\nzoAdO6KuSETqKfglZ4YOheuug5oa+OUvo65GROop+CWnvvvdMFfvpZfC449HXY2IgIJfcswMbr0V\n+vaFqipYk3ZHYBHJFQW/5Nw++4Sbuz7+OMzdu1UDe4hESsEvedG/fzjzf+qpMJa/iERHwS95U1UV\n2vx//evQ40dEoqHgl7y69loYMiSM6LlsWdTVpG/7dnjxRU04I8VBwS951aZNONtv2xZOOgk2boy6\nosZ9+ik8+ST87GcwejR07hzuQD70UPjzn6OuTqRlFPySd927h4naly2DyZMhDjePf/QRzJsHl10G\nw4eHO42HDw/Lb78dZhibNg3Ky+HrXw+D0YkUqibH6hHJhREj4KqrQrB+9atw7rn5ff9168IZ/ZNP\nhjP4F18MdxeXlcGgQaGer30Nhg2Drl137nfiieGTyhlnwIoVcMUVocuqSCHRWD0SmR07YMIEeOih\nEL5DhuTmfdxh1arPBv1rr4Xn2rYN7zt8eAj6IUOgffvdv96WLXDOOXDbbeGC9e23h9cRiVImY/Xo\njF8i06pVaDIZNAhOPjmM45+NqRh27IBXX90Z8k8+ufPGsY4dw1n8WWeFoD/yyHDdIRNt2uy8Ke2S\nS+Ctt+Cee8L4RCKFQMEvkerUKdzcNXQoJBLw4IOhuSUTW7eGg0Z9yD/1FLz/fnjugANCwNd/ffnL\n4YDTUmbhfoSDD4bTTw+fFObODfcriMSdgl8iN2gQ3HQTnH12aDO/8srdb79pUxjvvz7on302rAPo\n0wcmTtwZ9L1757YNvrISDjoIxo8PB6/q6nDxVyTO1MYvsXHmmaG9fO5c+Jd/2bn+/fdhwYKdbfSL\nFsG2bSHQBwwIAT98eGjC2X//aGp/6y0YOxaWLoVbboFvfzuaOqR0ZdLGr+CX2PjkEzj66NB98pe/\nDAH/5JOwJDlZZ5s2MHjwzrP5oUNh332jrTnVRx+FsYjmzYMf/AB+/vPsNCuJpEPBLwXrjTfCBdcP\nPwyDuw0dujPoBw+Of++ZbdvCRPO33BK6fv7hD9CuXdRVSSlQrx4pWAcfDM89F86eBwyA1gX2L7R1\n63C9ol8/+N734JhjwkQ0BxwQdWUiO+mDqMROnz7hrL/QQr+eGVxwAdx7b7g7+aijwg1iInGRVvCb\n2SgzW25mK8zskkaeP9TMnjGzzWb2H5nsK1Ksxo0L1yjcw4XnBx6IuiKRoMngN7My4CZgNNAfmGRm\nDXsrvw+cD1zbjH1FitYRR8Bf/hI+xYwbBzfeGHVFIumd8Q8GVrj7m+6+BZgBTEjdwN3XuftzQMO5\nlZrcV6TYdesW7jkYOzbMR3D++WGYZ5GopBP83YDVKcvvJNeloyX7ihSN9u1h9my46CK44YYwRtHf\n/x51VVKqYnNx18wmm1mtmdXWabYLKUJlZXDddaGr57x5od1/9eqm9xPJtnT6TawBDkxZ7p5cl460\n93X3qcBUCP3403x9kYIzZUoYSuLkk8O9CffdBxVp9b6WXdm8GT74ADZsCF+pjxsuf/ABdOgQmt7G\njo3ubu8opRP8zwF9zKwXIbSrgG+m+fot2VekaI0cGQaTGzs2DDdx551hjKFS5Q4ff9x4SO8uwOsf\nf/LJ7l9/773DgICdOoURWhcvDt1tIXS3HT8+fB12WGnMr5DWnbtmNga4HigDbnP3n5rZFAB3/x8z\n2x+oBToAO4CNQH93/6ixfZt6P925K6Vi7drQ3r9wIVxzDXz/+8UTPK+9Bi+9lH6Ab9u2+9fbd9/P\nhnf944bLjT3XcOht9zAUSE1N+Fq4MKzv3XvnQWDYMNhjj9z8bnJBQzaIFJBPPgkzes2aFaaivPHG\nwgqceu5hHoTq6vBVP8ZSvbKy9IK7seUOHTIfrjsTf/1rGBzw3nvh0UdD01HHjjBmTDgIjBoVr3Gh\nGqPgFykwO3bA5ZeHyd2//vVwEOjYMeqqmuYe7kqurg7zKixbFj6xDBsWhqwePhy6dAk/S/v2hfFp\n5uOP4ZFHwieBuXOhri7cRX7ssTs/DfToEXWVn6fgFylQv/tdOOs/5BC4/37o1Svqij7PPYycWn9m\n/8YbYRTSY48NYX/iicVzwXT79nADXn2T0NKlYf2AATsPAoMGxWMUVgW/SAH705/gG98IZ5n33huG\nqo7ajh0hAO++O4T9W2+FppcRI0LYT5yYnWkz4+7113ceBBYsCL+XAw4Id2WPHw/HHx/dCLIKfpEC\nt3x5mIzmnXfg978P4/zn2/bt8PTTO5tx1qwJ1x5GjgxhP348dO6c/7ri4r33wvhLNTXhvoyNG8MQ\n3CNHhgv2Y8bk92Co4BcpAuvXhzP/J5+Eq66Cyy7LfRv5tm3h/aqrw53Gf/sb7LlnuLhZWRm6nxbC\ntYd827wZHn9856eBNWtC88/QoTubhPr1y20NCn6RIrF5c5iL+I474F//FaZODUGcTVu3htCqroY5\nc8IBZ6+9wieOyspw5rrPPtl9z2LmDi+8sPMg8MILYX3fvjsPAkOHZr+XkoJfpIi4w9VXw49/HHrJ\nzJ4desq0xObNMH9+CPt77gn96Nu3D2f0lZXhDF8zh2XH22+H3kE1NfDYY+FA26VL+F2PHx+ahtq3\nb/n7KPhFitD06fCtb8FBB4UeP337Zrb/p5/CQw+F9vqamjC9ZYcOoT26sjIEUNyntix0H30U/gY1\nNeFvuGFDuLlsxIhwEBg3Lozm2hwKfpEi9dRToQfNjh3hzP+YY3a//aZN8OCD4cx+7txwAbJTp/Aa\nlZUhcLLddCTp2bYt/D1rakLvrTfeCAfi9eubdwOfgl+kiL35Zmh/f+MNuPXW0Paf6u9/D71NqqvD\n902boGvXcKG4sjL0ty/EO4OLmXu4+W358uaP2aTJ1kWKWO/e8MwzIcTPOCP0Lf/+98MZfXV16Fq4\neXO4iepb3wrbfe1rhTuHcSkwgy99KXzl5f10xi9SmLZuhXPOgWnTQnC4h/bhysrwdfTRuR3fRuJF\nZ/wiJWCPPeC3v4UhQ2DFitBEMHhwPIYPkHhT8IsUMLPQz18kEzo3EBEpMQp+EZESo+AXESkxCn4R\nkRKj4BcRKTEKfhGREqPgFxEpMQp+EZESE8shG8ysDnirmbt3BdZnsZxsUV2ZUV2ZUV2ZKca6erh7\nWpM9xjL4W8LMatMdryKfVFdmVFdmVFdmSr0uNfWIiJQYBb+ISIkpxuCfGnUBu6C6MqO6MqO6MlPS\ndRVdG7+IiOxeMZ7xi4jIbhR08JvZbWa2zsyWpKzrbGaPmNnrye+d8lzTgWb2uJm9amavmNkFMamr\nrZktNLMXk3VdEYe6UuorM7MXzGxuXOoys1Vm9rKZLTaz2hjV1dHMqs1smZktNbOjY1JXv+Tvqv7r\nIzO7MOrazOx7yX/zS8xsevL/Qhx+Xxcka3rFzC5MrstLXQUd/MDvgFEN1l0CzHf3PsD85HI+bQO+\n7+79gSHAuWbWPwZ1bQaOd/cBwEBglJkNiUFd9S4AlqYsx6Wu49x9YEoXuzjU9RtgnrsfCgwg/N4i\nr8vdlyd/VwOBI4FNwJwoazOzbsD5QIW7fxkoA6qirClZ15eBbwODCX/DsWZ2SN7qcveC/gJ6AktS\nlpcDX0w+/iKwPOL67gVOiFNdwN7A88BRcagL6J78R348MDcuf0dgFdC1wbpI6wL2BVaSvD4Xl7oa\nqXMk8FTUtQHdgNVAZ8KMg3OTtUX9dzwZmJayfDnww3zVVehn/I3Zz93/mnz8N2C/qAoxs57AEcBf\niEFdyeaUxcA64BF3j0VdwPWEf/Q7UtbFoS4HHjWzRWY2OSZ19QLqgNuTTWO3mlm7GNTVUBUwPfk4\nstrcfQ1wLfA28FfgQ3d/OMqakpYAXzOzLma2NzAGODBfdRVj8P+Dh8NmJN2WzKw9cDdwobt/FIe6\n3H27h4/h3YHByY+bkdZlZmOBde6+aFfbRPh3HJb8fY0mNNkNj0FdrYFBwC3ufgTwMQ2aA6L8dw9g\nZm2A8cCshs/lu7ZkG/kEwgHzAKCdmZ0WZU3J91wK/AJ4GJgHLAa256uuYgz+tWb2RYDk93X5LsDM\n9iCE/h/dfXZc6qrn7h8AjxOuj0Rd11eB8Wa2CpgBHG9md8SgrvqzRdx9HaGtenAM6noHeCf5aQ2g\nmnAgiLquVKOB5919bXI5ytq+Dqx09zp33wrMBoZGXBMA7j7N3Y909+HABuC1fNVVjMFfA5yRfHwG\noY09b8zMgGnAUnf/VYzqKjezjsnHexGuOyyLui53/5G7d3f3noTmgcfc/bSo6zKzdma2T/1jQrvw\nkqjrcve/AavNrF9y1Qjg1ajramASO5t5INra3gaGmNneyf+bIwgXwyP/fZnZF5LfDwK+AdyZt7ry\neUEjBxdIphPa7bYSzoTOAroQLhS+DjwKdM5zTcMIH89eInx8W0xov4u6rsOBF5J1LQF+nFwfaV0N\najyWnRf4JO6JAAAAfElEQVR3o/599QZeTH69AlwWh7qSNQwEapN/y3uATnGoK1lbO+A9YN+UdVH/\nLa8gnOQsAf4A7Bl1Tcm6niQctF8ERuTzd6U7d0VESkwxNvWIiMhuKPhFREqMgl9EpMQo+EVESoyC\nX0SkxCj4RURKjIJfRKTEKPhFRErM/wNndpKVreGFAQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x113b92c90>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "CH_scores_pca = []\n",
    "\n",
    "for K in Ks:\n",
    "    ch = K_cluster_analysis(K, X_train_pca)\n",
    "    CH_scores_pca.append(ch)\n",
    "    \n",
    "\n",
    "plt.plot(Ks, np.array(CH_scores_pca), 'b-')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 10\n",
      "CH_score: 0.348228268317, time elaps:3\n",
      "K-means begin with clusters: 12\n",
      "CH_score: 0.325983908228, time elaps:3\n",
      "K-means begin with clusters: 14\n",
      "CH_score: 0.297797297634, time elaps:3\n",
      "K-means begin with clusters: 16\n",
      "CH_score: 0.25134864419, time elaps:2\n",
      "K-means begin with clusters: 18\n",
      "CH_score: 0.245690988453, time elaps:4\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x114e57d50>]"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD8CAYAAABw1c+bAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHQNJREFUeJzt3Xl4VOXZx/HvTQAXBLVNbBWQRSJILQFNBQVRQVvUYkSr\ngKJoVV583evVVkFAi1rr1retWxEVq1RFRYtat1or4h4QkVUiLoBaEbdqFQjc7x/PRCYhyARm8szy\n+1zXXMzMOVPuRPo7Z879nOcxd0dERApHk9gFiIhI41Lwi4gUGAW/iEiBUfCLiBQYBb+ISIFR8IuI\nFBgFv4hIgVHwi4gUGAW/iEiBaRq7gPoUFxd7+/btY5chIpIzZs6c+ZG7l6Syb1YGf/v27amsrIxd\nhohIzjCzd1LdV5d6REQKjIJfRKTAKPhFRAqMgl9EpMAo+EVECoyCX0SkwCj4RUQKTF4F//jx8MIL\nsasQEclueRP8n34Kf/4z7LcfDBkCb78duyIRkeyUN8G/ww6waBGMGwfTpkGXLnDhhfD557ErExHJ\nLnkT/AAtWsDFF8Mbb4Sz/iuugE6dwjeB6urY1YmIZIe8Cv4abdrApElQWQl77AEjR0KPHvDEE7Er\nExGJLy+Dv8bee8O//gVTp8JXX8FPfgKHHQbz58euTEQknrwOfgAzGDQI5s2Da66B55+Hbt3gf/8X\nVqyIXZ2ISOPL++CvsdVW8ItfQFUVnH46TJgQrv9fdRWsWhW7OhGRxlMwwV+juBj+9CeYOxf69oVf\n/Sr0Ae67D9xjVyciknkFF/w1unSBhx6CJ5+Eli3hmGNg//3h5ZdjVyYiklkFG/w1Dj4YZs2Cm28O\nl4F69oRhw2Dp0tiViYhkRsEHP0BREZx6KixeDKNGhcs+u+8OY8bAF1/Erk5EJL0U/ElatoTLLgt3\nAB91FFx6KZSWwi23wNq1sasTEUmPlILfzAaY2SIzqzKzC+rZXmFmc8xstplVmlmfOtuLzOxVM3s4\nXYVnUrt2MHlymPCtQ4fwbWDvveGf/4xdmYjIlttk8JtZEXA9cCjQFRhqZl3r7PYUUObu3YGfAxPr\nbD8HWLDl5TauXr3guefgnnvgs8+gf3844ojwjUBEJFelcsa/D1Dl7kvcfTVwN1CRvIO7f+H+zWDI\nFsA3AyPNrA1wOBseDHKCGRx7LCxYEOb++de/YM894ZxzYOXK2NWJiDRcKsHfGkge47Is8V4tZjbI\nzBYCjxDO+mv8H/ArYN0W1Bnd1lvDr38dRv6ceipcd124Aez3v4fVq2NXJyKSurQ1d939AXfvAhwJ\njAcws58CH7r7zE193sxGJPoDlSuyeC6FnXaCG2+E116DffYJdwP/4Afw4IO6AUxEckMqwb8caJv0\nuk3ivXq5+3Sgo5kVA72BI8zsbcIlon5mdudGPjfB3cvdvbykpCTV+qPZc094/HF49FFo3jzMB9Sv\nX7gnQEQkm6US/K8ApWbWwcyaA0OAack7mFknM7PE872ArYCV7n6hu7dx9/aJz/3T3Yel9SeIbMCA\ncPZ/ww1hGojycjj5ZFi+0UOjiEhcmwx+d68GzgQeJ4zMmeLu88xspJmNTOx2NDDXzGYTRgANTmr2\n5r2mTcPEb1VV8Mtfwl//Gm4Au+QS+PLL2NWJiNRm2ZjP5eXlXllZGbuMzbZkCVxwAdx7L7RuDZdf\nHqaBaKLb5UQkQ8xspruXp7KvoigDOnaEKVNgxgzYZRcYPhx+9COYPj12ZSIiCv6M6t0bXnwR7rwT\nPvwQDjgAjj46XBISEYlFwZ9hTZrA8ceHu30vvTSMBOraFc4/Hz75JHZ1IlKIFPyNZNttYfToMAPo\niSeGG79KS8ONYGvWxK5ORAqJgr+R7bwzTJwIr74KZWVw1lnwwx/Cww/rBjARaRwK/kjKyuAf/4Bp\n00LgDxwIhxwS7gkQEckkBX9EZiHw586FP/4xfAvo0QNOOw0++CB2dSKSrxT8WaBZs3DJp6oKzj0X\nbr89XP+//HL46qvY1YlIvlHwZ5Edd4Rrr4V588Jln9GjoXPncCfwupye21REsomCPwuVlsLUqfD0\n01BcHIaD7rtvWBRGRGRLKfiz2IEHQmUl3HYbLF0KffrA4MHw1luxKxORXKbgz3JNmsBJJ4Xx/+PG\nwUMPQZcuYVGYzz6LXZ2I5CIFf45o0QIuvjgcAIYOhSuvDJeEbroJqqtjVyciuUTBn2Nat4ZJk8Il\noD32CNNBl5XBY4/FrkxEcoWCP0ftvXdY+H3qVFi1Cg49NDzmzYtdmYhkOwV/DjMLSz7OmwfXXAMv\nvADduoVvAR9+GLs6EclWCv48sNVWYdH3qio44wy4+eZw/f/KK+Hrr2NXJyLZRsGfR4qLw9QPc+dC\n375h5M8ee4RFYTQBnIjUUPDnoS5dwrDPJ5+EVq3C2P8+feDll2NXJiLZQMGfxw4+GGbNCpd+3nwT\nevYMdwG/+27sykQkJgV/nisqglNPDeP/R42C++8P8/+MHg3/+U/s6kQkBgV/gWjZEi67LCwBedRR\nYebP0tIwAZyIFBYFf4Fp1w4mTw6LwLdvHy79HH+8pn8QKSQK/gLVsyfMmAGXXAL33APdu2v2T5FC\noeAvYE2bwtix8Oyz4Wawvn3DfECa+0ckvyn4hX33hdmzwyWfSy4JBwBN/SySvxT8AoTx/n/5S2j2\nzpsXJn67887YVYlIJij4pZahQ+G118KcPyecoMavSD5S8MsG2rcPM3/+5jdq/IrkIwW/1KtpUxgz\nRo1fkXyUUvCb2QAzW2RmVWZ2QT3bK8xsjpnNNrNKM+uTeL+tmT1tZvPNbJ6ZnZPuH0AyS41fkfyz\nyeA3syLgeuBQoCsw1My61tntKaDM3bsDPwcmJt6vBs53965AL+CMej4rWU6NX5H8ksoZ/z5Albsv\ncffVwN1ARfIO7v6F+zcT/7YAPPH+++4+K/H8P8ACoHW6ipfGVdP4LStT41ckl6US/K2BpUmvl1FP\neJvZIDNbCDxCOOuvu7090AN4aXMKlezQvj08/fT6xm9ZmRq/Irkmbc1dd3/A3bsARwLjk7eZ2XbA\n/cC57v55fZ83sxGJ/kDlihUr0lWWZEBy47dJk3Ddf9w4NX5FckUqwb8caJv0uk3ivXq5+3Sgo5kV\nA5hZM0LoT3b3qd/yuQnuXu7u5SUlJSkVL3ElN35/85twAFiyJHZVIrIpqQT/K0CpmXUws+bAEGBa\n8g5m1snMLPF8L2ArYGXivVuABe5+bXpLl2yQ3PidPz+M+VfjVyS7bTL43b0aOBN4nNCcneLu88xs\npJmNTOx2NDDXzGYTRgANTjR7ewMnAP0SQz1nm9lhGflJJCo1fkVyh3kWrsJdXl7ulZWVscuQzVBd\nDb/9bRjz36ZNmPu/d+/YVYnkPzOb6e7lqeyrO3clrWoavzNmqPErkq0U/JIRvXqFxu+wYaHxu//+\navyKZAsFv2RMq1Zw++1w112wYIEavyLZQsEvGTdkiBq/ItlEwS+Nol27MNXz+PG641ckNgW/NJqi\nIrjootD4LSpS41ckFgW/NLpeveDVV9X4FYlFwS9R1Nf4veMOyMLbSkTyjoJfokpu/J54ohq/Io1B\nwS/RJTd+p0wJB4EZM2JXJZK/FPySFeo2fg84AMaOVeNXJBMU/JJVkhu/48er8SuSCQp+yTpq/Ipk\nloJfslZ9jd9PP41dlUjuU/BLVqvb+O3eXY1fkS2l4Jesp8avSHop+CVn1DR+TzhBjV+RLaHgl5zS\nqhVMmgR3363Gr8jmUvBLTho8ODR+u3dX41ekoRT8krPatYOnn1bjV6ShFPyS02oav889p8avSKoU\n/JIXevYMa/yq8SuyaQp+yRstW6rxK5IKBb/knbqN3+OOU+NXJJmCX/JSTeP30kvh3nvV+BVJpuCX\nvFVUBKNHq/ErUpeCX/JefY3fN9+MXZVIPAp+KQj1NX7/8hc1fqUwKfiloAweDHPmQI8eMHy4Gr9S\nmBT8UnB23bV247esDJ59NnZVIo0npeA3swFmtsjMqszsgnq2V5jZHDObbWaVZtYn1c+KxJDc+G3W\nDA48EMaMgTVrYlcmknmbDH4zKwKuBw4FugJDzaxrnd2eAsrcvTvwc2BiAz4rEk3PnmGq5xNPDN8A\n1PiVQpDKGf8+QJW7L3H31cDdQEXyDu7+hfs3bbIWgKf6WZHYWraE224Ljd+FC9X4lfyXSvC3BpYm\nvV6WeK8WMxtkZguBRwhn/Sl/NvH5EYnLRJUrVqxIpXaRtFLjVwpF2pq77v6Au3cBjgTGb8bnJ7h7\nubuXl5SUpKsskQapafxedpkav5K/Ugn+5UDbpNdtEu/Vy92nAx3NrLihnxXJBkVFMGoUPP+8Gr+S\nn1IJ/leAUjPrYGbNgSHAtOQdzKyTmVni+V7AVsDKVD4rkq322WfDxq+mepZ8sMngd/dq4EzgcWAB\nMMXd55nZSDMbmdjtaGCumc0mjOIZ7EG9n83EDyKSCTWN33vuCY3fvn3hvfdiVyWyZcyzcOhCeXm5\nV1ZWxi5DpJbXXoM+faBzZ3jmGWjRInZFIuuZ2Ux3L09lX925K5KisjK4665w+WfYMFi3LnZFIptH\nwS/SAD/9KVx7LTz4IFx4YexqRDZP09gFiOSas8+GN96AK6+E0lI49dTYFYk0jIJfpIHM4A9/CFM7\nnH46dOwI/frFrkokdbrUI7IZmjYNI306d4ajjw4jfkRyhYJfZDNtvz08/DA0bx6u/X/0UeyKRFKj\n4BfZAu3bw9/+BsuWwaBBsGpV7IpENk3BL7KFevWC22+HGTNCozcLb40RqUXNXZE0GDwYFi8Oc/p0\n7gwXXRS7IpGNU/CLpMno0WGY55gxYZjn4MGxKxKpny71iKSJGdx8c5jWYfhweOGF2BWJ1E/BL5JG\nW20FDzwAbdpARQW8/XbsikQ2pOAXSbPi4jDMc82aMMzzs89iVyRSm4JfJAO6dIH774dFi+DYY6G6\nOnZFIusp+EUypF8/uOkmeOKJML+PhnlKttCoHpEMOuWU9RO6de4M55wTuyIRBb9Ixv32t2GM/y9+\nAbvtFq77i8SkSz0iGdakCdxxB/ToAUOGhJW8RGJS8Is0ghYtYNo02HHHcMb//vuxK5JCpuAXaSS7\n7AIPPQSffAIDB8KXX8auSAqVgl+kEXXvHtbtnTULTjxR6/ZKHAp+kUY2cGBYt3fqVBg1KnY1Uog0\nqkckgnPOCTd3/e53YUK3U06JXZEUEgW/SARm8Mc/wpIlMHJkWLf3oINiVyWFQpd6RCJp1gymTIHd\nd4ejjgrfAEQag4JfJKKadXubNYPDD9e6vdI4FPwikXXosH7d3qOO0rq9knkKfpEssO++MGkSPPss\njBihCd0ks9TcFckSQ4aEOX3Gjg3X/UePjl2R5KuUzvjNbICZLTKzKjO7oJ7tx5vZHDN73cyeN7Oy\npG3nmdk8M5trZneZ2dbp/AFE8slFF8GwYeHPKVNiVyP5apPBb2ZFwPXAoUBXYKiZda2z21vAAe7+\nQ2A8MCHx2dbA2UC5u+8JFAFD0le+SH4xg4kT16/b+9JLsSuSfJTKGf8+QJW7L3H31cDdQEXyDu7+\nvLt/knj5ItAmaXNTYBszawpsC7y35WWL5K+adXt32QWOOELr9kr6pRL8rYGlSa+XJd7bmFOARwHc\nfTlwNfAu8D7wmbs/sXmlihSO4mJ45JEwwmfgQPj889gVST5J66geMzuIEPy/TrzekfDtoAOwC9DC\nzIZt5LMjzKzSzCpXrFiRzrJEclLNur0LF8LgwVq3V9InleBfDrRNet0m8V4tZtYNmAhUuPvKxNsH\nA2+5+wp3XwNMBfar7y9x9wnuXu7u5SUlJQ35GUTyVv/+cMMN8NhjYX4fDfOUdEhlOOcrQKmZdSAE\n/hDguOQdzGxXQqif4O5vJG16F+hlZtsCXwH9gcp0FC5SKE47Lazbe/XVYd3es8+OXZHkuk0Gv7tX\nm9mZwOOEUTm3uvs8MxuZ2H4TMBb4LnCDmQFUJ87eXzKz+4BZQDXwKokRPyKSuiuugKoqOO+8sG7v\n4YfHrkhymXkWfncsLy/3ykp9MRBJ9uWX0LdvOPt/7jno1i12RZJNzGymu5ensq+mbBDJES1ahKUb\nt99e6/bKllHwi+SQmnV7P/44jPH/739jVyS5SMEvkmN69Ajr9s6cqXV7ZfMo+EVy0MCBcM01YZy/\nJnOThtLsnCI56txzw6pdV1wRZvM8+eTYFUmuUPCL5Cgz+NOfwrq9I0aEBV0OPDB2VZILdKlHJIfV\nrNtbWhpW73rjjU1/RkTBL5LjdtghTOjWtGm4sWvlyk1/Rgqbgl8kD3ToAA8+CEuXhjP/1atjVyTZ\nTMEvkif22w9uuw2mT9e6vfLt1NwVySNDh4br/BdfHEb6jBoVuyLJRgp+kTwzdmwI/9GjQ9P3mGNi\nVyTZRpd6RPKMGdxyC/TuHe7s1bq9UpeCXyQPbb31+nV7KyrgnXdiVyTZRMEvkqdKSuDhh+Hrr8Ns\nnlq3V2oo+EXy2B57wH33wYIFMGSI1u2VQMEvkucOPhhuvBEefTSs4CWiUT0iBeC008KEbtdcE4Z5\nnnVW7IokJgW/SIH43e/Cur3nnhvW7T3ssNgVSSy61CNSIIqKYPJkKCuDwYNhzpzYFUksCn6RAlKz\nbm+rVmGkzwcfxK5IYlDwixSY1q1D+K9cGcb4a93ewqPgFylAe+0Ff/0rvPIKDB+udXsLjYJfpEBV\nVMDVV4dx/mPGxK5GGpNG9YgUsPPOC8M8L788TOh20kmxK5LGoOAXKWBmcN11tdftPeCA2FVJpulS\nj0iBa9YM7r0XOnWCQYO0bm8hUPCLCDvsECZ0KyoKwzw//jh2RZJJCn4RAaBjx7Bu7zvvaN3efKfg\nF5Fv9O4d1u195hn4n//Rur35KqXgN7MBZrbIzKrM7IJ6th9vZnPM7HUze97MypK27WBm95nZQjNb\nYGb7pvMHEJH0Ou44GDcOJk0K8/tI/tnkqB4zKwKuBw4BlgGvmNk0d5+ftNtbwAHu/omZHQpMAHom\ntv0BeMzdf2ZmzYFt0/oTiEjajRsXmrwXXhiavj/7WeyKJJ1SOePfB6hy9yXuvhq4G6hI3sHdn3f3\nTxIvXwTaAJjZ9kBf4JbEfqvd/dN0FS8imWEGt94K++0HJ5wAL78cuyJJp1SCvzWwNOn1ssR7G3MK\n8GjieQdgBXCbmb1qZhPNrEV9HzKzEWZWaWaVK1asSKEsEcmkrbcOzd6dd4YjjoB3341dkaRLWpu7\nZnYQIfh/nXirKbAXcKO79wC+BDboEQC4+wR3L3f38pKSknSWJSKbqWbd3q++0rq9+SSV4F8OtE16\n3SbxXi1m1g2YCFS4+8rE28uAZe7+UuL1fYQDgYjkiK5dw3w+8+fD0KFatzcfpBL8rwClZtYh0Zwd\nAkxL3sHMdgWmAie4+zf3/bn7B8BSM+uceKs/kNwUFpEccMghcMMN8Pe/w/nnx65GttQmR/W4e7WZ\nnQk8DhQBt7r7PDMbmdh+EzAW+C5wg5kBVLt7eeJ/4ixgcuKgsQQ4Of0/hohk2ogRYUK3a68N6/ae\ncUbsimRzmWfhHRrl5eVeWVkZuwwRqWPt2nBX78MPwyOPwIABsSuSGmY2M+mE+1vpzl0RSVnNur3d\nusGxx8Lrr8euSDaHgl9EGmS77cLSjS1bat3eXKXgF5EGa9MmhP9HH4WVvL76KnZF0hAKfhHZLHvt\nFS77aN3e3KPgF5HNduSRcOWVYSGXsWNjVyOp0tKLIrJFzj8/TOh22WVh3d7hw2NXJJuiM34R2SJm\ncP310L8/nHYaTJ8euyLZFAW/iGyxmnV7d9strNu7eHHsiuTbKPhFJC123DHc2GWmdXuznYJfRNJm\nt93CVM5vvw1HH611e7OVmrsiklZ9+oRFXIYNg4MOCou5dOoUGr+lpdC6NTTRKWdUCn4RSbvjjw+X\nev78Z7juOvj66/Xbtt46fDOoORAkHxR22UUHhcagSdpEJKPWrYPly0PDd/FiqKpa//zNN2HVqvX7\nbrPN+oNC8gGhUycdFDalIZO06YxfRDKqSRNo2zY8+vWrvW3dOli2bMMDwsKFYfbP5B7BNtusPxjU\nd1AIM8JLKhT8IhJNkyaw667h0b9/7W1r19Z/UFiwIIweSj4obLttOAAkHxBqDgo776yDQl0KfhHJ\nSkVF0K5deBx8cO1ta9fC0qUbHhTmzw+Tx61Zs37fFi02PCjUPP/+9wvzoKDgF5GcU1QE7duHxyGH\n1N62di28++6GB4W5c2HatPoPCnW/JZSWwve+l78HBQW/iOSVoiLo0CE8fvzj2tuqq8NBIfmAsHgx\nzJkT7j9IXkh+u+3q/5ZQWgo77ZTbBwUFv4gUjKZNoWPH8KjvoPDOO7UPClVVMHs2PPBA7YNCy5Yb\nPyiUlGT/QUHBLyJCOCjstlt4/OQntbetWVP/QWHWLLj//nB5qUarVvWPPiotheLi7DgoaBy/iMgW\nWLMmTFFR96CweHF4v+5Bob5vCZ06bflBoSHj+BX8IiIZUnNQSO4nJB8Uklct2377sIj9M89s3gFA\nN3CJiGSBZs3Wn9XXtXp17YNCVVV4rzEuBSn4RUQiaN4cdt89PBqbZr4QESkwCn4RkQKj4BcRKTAK\nfhGRAqPgFxEpMAp+EZECo+AXESkwCn4RkQKTlVM2mNkK4J3N/Hgx8FEay0kX1dUwqqthVFfD5GNd\n7dy9JJUdszL4t4SZVaY6X0VjUl0No7oaRnU1TKHXpUs9IiIFRsEvIlJg8jH4J8QuYCNUV8OoroZR\nXQ1T0HXl3TV+ERH5dvl4xi8iIt8ip4PfzG41sw/NbG7Se98xsyfNbHHizx2zpK5jzGyema0zsyij\nCTZS11VmttDM5pjZA2a2Q5bUNT5R02wze8LMdsmGupK2nW9mbmbF2VCXmV1sZssTv6/ZZnZYNtSV\neP+sxL+xeWZ2ZTbUZWb3JP2u3jaz2VlSV3czezFRV6WZ7ZOJvzungx+YBAyo894FwFPuXgo8lXjd\n2CaxYV1zgaOA6Y1ezXqT2LCuJ4E93b0b8AZwYWMXRf11XeXu3dy9O/AwMLbRq6q/LsysLfBj4N3G\nLihhEvXUBfze3bsnHn9v5JqgnrrM7CCgAihz9x8AV2dDXe4+uOZ3BdwPTM2GuoArgUsSdY1NvE67\nnA5+d58OfFzn7Qrg9sTz24EjG7Uo6q/L3Re4+6LGrqVODfXV9YS7Vydevgi0yZK6Pk962QJo9GbU\nRv59Afwe+BURaoJvrSuqjdR1OnCFu69K7PNhltQFgJkZcCxwV6MWxUbrcqBV4vn2wHuZ+LtzOvg3\n4nvu/n7i+QfA92IWk2N+Djwau4gaZnaZmS0FjifOGf8GzKwCWO7ur8WupR5nJS6P3RrjEudG7A7s\nb2YvmdkzZvaj2AXVsT/wb3dfHLuQhHOBqxL/7q8mQ9/A8zH4v+FhyJKGLaXAzEYD1cDk2LXUcPfR\n7t6WUNOZsesxs22BUWTJQaiOG4GOQHfgfeCauOV8oynwHaAX8EtgSuIsO1sMJcLZ/rc4HTgv8e/+\nPOCWTPwl+Rj8/zaznQESfzb6V8tcY2YnAT8FjvfsHN87GTg6dhHAbkAH4DUze5twWWyWmX0/alWA\nu//b3de6+zrgZiAjTcHNsAyY6sHLwDrCfDTRmVlTQt/tnti1JBnO+n7DvWTov2M+Bv80wi+PxJ9/\ni1hL1jOzAYTr1Ue4+39j11PDzEqTXlYAC2PVUsPdX3f3ndy9vbu3J4TaXu7+QeTSak5yagwiDCbI\nBg8CBwGY2e5Ac7JncrSDgYXuvix2IUneAw5IPO8HZOYSlLvn7IPwFe19YA3h/4SnAN8ljOZZDPwD\n+E6W1DUo8XwV8G/g8SypqwpYCsxOPG7KkrruJ4TXHOAhoHU21FVn+9tAcTbUBdwBvJ74fU0Dds6S\nupoDdyb+W84C+mVDXYn3JwEjG7ueTfy++gAzgdeAl4C9M/F3685dEZECk4+XekRE5Fso+EVECoyC\nX0SkwCj4RUQKjIJfRKTAKPhFRAqMgl9EpMAo+EVECsz/AySuEjMiCkytAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x114dc1650>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "Ks2 = np.arange(10,20,2)\n",
    "CH_scores_pca2 = []\n",
    "\n",
    "for K in Ks2:\n",
    "    ch = K_cluster_analysis(K, X_train_pca)\n",
    "    CH_scores_pca2.append(ch)\n",
    "    \n",
    "\n",
    "plt.plot(Ks2, np.array(CH_scores_pca2), 'b-')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结\n",
    "\n",
    "1. 做 pca 有效果，可以试着做一下\n",
    "2. 数据标准化在当前数据模型里无效，不建议做\n",
    "3. 实验结果 K = 10 时， hscore 最高"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
